Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (2)

Search Parameters:
Keywords = visual question generation (VQG)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 5695 KB  
Article
Machine-to-Machine Visual Dialoguing with ChatGPT for Enriched Textual Image Description
by Riccardo Ricci, Yakoub Bazi and Farid Melgani
Remote Sens. 2024, 16(3), 441; https://doi.org/10.3390/rs16030441 - 23 Jan 2024
Cited by 13 | Viewed by 4034
Abstract
Image captioning is a technique that enables the automatic extraction of natural language descriptions about the contents of an image. On the one hand, information in the form of natural language can enhance accessibility by reducing the expertise required to process, analyze, and [...] Read more.
Image captioning is a technique that enables the automatic extraction of natural language descriptions about the contents of an image. On the one hand, information in the form of natural language can enhance accessibility by reducing the expertise required to process, analyze, and exploit remote sensing images, while on the other, it provides a direct and general form of communication. However, image captioning is usually restricted to a single sentence, which barely describes the rich semantic information that typically characterizes remote sensing (RS) images. In this paper, we aim to move one step forward by proposing a captioning system that, mimicking human behavior, adopts dialogue as a tool to explore and dig for information, leading to more detailed and comprehensive descriptions of RS scenes. The system relies on a questions–answers scheme fed by a query image and summarizes the dialogue content with ChatGPT. Experiments carried out on two benchmark remote sensing datasets confirm the potential of such an approach in the context of semantic information mining. Strengths and weaknesses are highlighted and discussed, as well as some possible future developments. Full article
Show Figures

Graphical abstract

16 pages, 2036 KB  
Article
Goal-Driven Visual Question Generation from Radiology Images
by Mourad Sarrouti, Asma Ben Abacha and Dina Demner-Fushman
Information 2021, 12(8), 334; https://doi.org/10.3390/info12080334 - 20 Aug 2021
Cited by 10 | Viewed by 4585
Abstract
Visual Question Generation (VQG) from images is a rising research topic in both fields of natural language processing and computer vision. Although there are some recent efforts towards generating questions from images in the open domain, the VQG task in the medical domain [...] Read more.
Visual Question Generation (VQG) from images is a rising research topic in both fields of natural language processing and computer vision. Although there are some recent efforts towards generating questions from images in the open domain, the VQG task in the medical domain has not been well-studied so far due to the lack of labeled data. In this paper, we introduce a goal-driven VQG approach for radiology images called VQGRaD that generates questions targeting specific image aspects such as modality and abnormality. In particular, we study generating natural language questions based on the visual content of the image and on additional information such as the image caption and the question category. VQGRaD encodes the dense vectors of different inputs into two latent spaces, which allows generating, for a specific question category, relevant questions about the images, with or without their captions. We also explore the impact of domain knowledge incorporation (e.g., medical entities and semantic types) and data augmentation techniques on visual question generation in the medical domain. Experiments performed on the VQA-RAD dataset of clinical visual questions showed that VQGRaD achieves 61.86% BLEU score and outperforms strong baselines. We also performed a blinded human evaluation of the grammaticality, fluency, and relevance of the generated questions. The human evaluation demonstrated the better quality of VQGRaD outputs and showed that incorporating medical entities improves the quality of the generated questions. Using the test data and evaluation process of the ImageCLEF 2020 VQA-Med challenge, we found that relying on the proposed data augmentation technique to generate new training samples by applying different kinds of transformations, can mitigate the lack of data, avoid overfitting, and bring a substantial improvement in medical VQG. Full article
(This article belongs to the Special Issue Neural Natural Language Generation)
Show Figures

Figure 1

Back to TopTop