You are currently on the new version of our website. Access the old version .

564 Results Found

  • Article
  • Open Access
10 Citations
4,895 Views
18 Pages

15 November 2019

Due to the rapid growth of deep learning technologies, automatic image description generation is an interesting problem in computer vision and natural language generation. It helps to improve access to photo collections on social media and gives guid...

  • Article
  • Open Access
6 Citations
3,233 Views
13 Pages

6 May 2020

The automatic generation of language description is an important task in the intelligent analysis of aluminum alloy metallographic images, and is crucial for the high-quality development of the non-ferrous metals manufacturing industry. In this paper...

  • Article
  • Open Access
74 Views
14 Pages

The Role of AI-Generated Clinical Image Descriptions in Enhancing Teledermatology Diagnosis: A Cross-Sectional Exploratory Study

  • Jonathan Shapiro,
  • Binyamin Greenfield,
  • Itay Cohen,
  • Roni P. Dodiuk-Gad,
  • Yuliya Valdman-Grinshpoun,
  • Tamar Freud,
  • Anna Lyakhovitsky,
  • Ziad Khamaysi and
  • Emily Avitan-Hersh

Background/Objectives: AI models such as ChatGPT-4 have shown strong performance in dermatology; however, the diagnostic value of AI-generated clinical image descriptions remains underexplored. This study assesses whether ChatGPT-4’s image desc...

  • Article
  • Open Access
3 Citations
1,482 Views
33 Pages

Making Images Speak: Human-Inspired Image Description Generation

  • Chifaa Sebbane,
  • Ikram Belhajem and
  • Mohammed Rziza

28 April 2025

Despite significant advances in deep learning-based image captioning, many state-of-the-art models still struggle to balance visual grounding (i.e., accurate object and scene descriptions) with linguistic coherence (i.e., grammatical fluency and appr...

  • Article
  • Open Access
5 Citations
2,759 Views
17 Pages

Visual Description Augmented Integration Network for Multimodal Entity and Relation Extraction

  • Min Zuo,
  • Yingjun Wang,
  • Wei Dong,
  • Qingchuan Zhang,
  • Yuanyuan Cai and
  • Jianlei Kong

18 May 2023

Multimodal Named Entity Recognition (MNER) and multimodal Relationship Extraction (MRE) play an important role in processing multimodal data and understanding entity relationships across textual and visual domains. However, irrelevant image informati...

  • Review
  • Open Access
16 Citations
7,326 Views
25 Pages

Recent Advances in Synthesis and Interaction of Speech, Text, and Vision

  • Laura Orynbay,
  • Bibigul Razakhova,
  • Peter Peer,
  • Blaž Meden and
  • Žiga Emeršič

In recent years, there has been increasing interest in the conversion of images into audio descriptions. This is a field that lies at the intersection of Computer Vision (CV) and Natural Language Processing (NLP), and it involves various tasks, inclu...

  • Article
  • Open Access
131 Citations
8,506 Views
18 Pages

Description Generation for Remote Sensing Images Using Attribute Attention Mechanism

  • Xiangrong Zhang,
  • Xin Wang,
  • Xu Tang,
  • Huiyu Zhou and
  • Chen Li

13 March 2019

Image captioning generates a semantic description of an image. It deals with image understanding and text mining, which has made great progress in recent years. However, it is still a great challenge to bridge the “semantic gap” between l...

  • Article
  • Open Access
7 Citations
2,442 Views
14 Pages

Generating Image Descriptions of Rice Diseases and Pests Based on DeiT Feature Encoder

  • Chunxin Ma,
  • Yanrong Hu,
  • Hongjiu Liu,
  • Ping Huang,
  • Yikun Zhu and
  • Dan Dai

5 September 2023

We propose a DeiT (Data-Efficient Image Transformer) feature encoder-based algorithm for identifying disease types and generating relevant descriptions of diseased crops. It solves the scarcity problem of the image description algorithm applied in ag...

  • Article
  • Open Access
1 Citations
2,286 Views
14 Pages

1 November 2021

In this paper, a framework based on generative adversarial networks is proposed to perform nature-scenery generation according to descriptions from the users. The desired place, time and seasons of the generated scenes can be specified with the help...

  • Article
  • Open Access
6 Citations
3,461 Views
38 Pages

A New Generative Model for Textual Descriptions of Medical Images Using Transformers Enhanced with Convolutional Neural Networks

  • Artur Gomes Barreto,
  • Juliana Martins de Oliveira,
  • Francisco Nauber Bernardo Gois,
  • Paulo Cesar Cortez and
  • Victor Hugo Costa de Albuquerque

The automatic generation of descriptions for medical images has sparked increasing interest in the healthcare field due to its potential to assist professionals in the interpretation and analysis of clinical exams. This study explores the development...

  • Article
  • Open Access
2 Citations
1,549 Views
16 Pages

In view of the insufficiency of the text encoder using CLIP and the insufficiency of the interaction between the two towers using CLIP, a CLIP-based video description model RAMSG is proposed, which combines retrieval augmentation with multi-scale sem...

  • Feature Paper
  • Article
  • Open Access
1,003 Views
19 Pages

9 October 2025

Recent advances in vision-language models such as BLIP-2 have made AI-generated image descriptions increasingly fluent and difficult to distinguish from human-authored texts. This paper investigates whether such differences can still be reliably dete...

  • Article
  • Open Access
2,576 Views
11 Pages

Scene description refers to the automatic generation of natural language descriptions from videos. In general, deep learning-based scene description networks utilize multimodalities, such as image, motion, audio, and label information, to improve the...

  • Article
  • Open Access
2 Citations
3,563 Views
17 Pages

Middle-Level Attribute-Based Language Retouching for Image Caption Generation

  • Zhibin Guan,
  • Kang Liu,
  • Yan Ma,
  • Xu Qian and
  • Tongkai Ji

9 October 2018

Image caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing...

  • Article
  • Open Access
15 Citations
8,686 Views
15 Pages

Enhanced Image Captioning with Color Recognition Using Deep Learning Methods

  • Yeong-Hwa Chang,
  • Yen-Jen Chen,
  • Ren-Hung Huang and
  • Yi-Ting Yu

26 December 2021

Automatically describing the content of an image is an interesting and challenging task in artificial intelligence. In this paper, an enhanced image captioning model—including object detection, color analysis, and image captioning—is prop...

  • Article
  • Open Access
2 Citations
2,644 Views
17 Pages

A Study on Generative Models for Visual Recognition of Unknown Scenes Using a Textual Description

  • Jose Martinez-Carranza,
  • Delia Irazú Hernández-Farías,
  • Victoria Eugenia Vazquez-Meza,
  • Leticia Oyuki Rojas-Perez and
  • Aldrich Alfredo Cabrera-Ponce

27 October 2023

In this study, we investigate the application of generative models to assist artificial agents, such as delivery drones or service robots, in visualising unfamiliar destinations solely based on textual descriptions. We explore the use of generative m...

  • Article
  • Open Access
2,366 Views
16 Pages

Automatic Identification and Description of Jewelry Through Computer Vision and Neural Networks for Translators and Interpreters

  • José Manuel Alcalde-Llergo,
  • Aurora Ruiz-Mezcua,
  • Rocío Ávila-Ramírez,
  • Andrea Zingoni,
  • Juri Taborri and
  • Enrique Yeguas-Bolívar

15 May 2025

Identifying jewelry pieces presents a significant challenge due to the wide range of styles and designs. Currently, precise descriptions are typically limited to industry experts. However, translators and interpreters often require a comprehensive un...

  • Article
  • Open Access
25 Citations
6,189 Views
15 Pages

To automatically generate accurate and meaningful textual descriptions of images is an ongoing research challenge. Recently, a lot of progress has been made by adopting multimodal deep learning approaches for integrating vision and language. However,...

  • Review
  • Open Access
8 Citations
5,466 Views
22 Pages

Supervised Deep Learning Techniques for Image Description: A Systematic Review

  • Marco López-Sánchez,
  • Betania Hernández-Ocaña,
  • Oscar Chávez-Bosquez and
  • José Hernández-Torruco

23 March 2023

Automatic image description, also known as image captioning, aims to describe the elements included in an image and their relationships. This task involves two research fields: computer vision and natural language processing; thus, it has received mu...

  • Article
  • Open Access
12 Citations
4,423 Views
18 Pages

Towards Mapping Images to Text Using Deep-Learning Architectures

  • Daniela Onita,
  • Adriana Birlutiu and
  • Liviu P. Dinu

18 September 2020

Images and text represent types of content that are used together for conveying a message. The process of mapping images to text can provide very useful information and can be included in many applications from the medical domain, applications for bl...

  • Article
  • Open Access
12 Citations
5,065 Views
24 Pages

ACapMed: Automatic Captioning for Medical Imaging

  • Djamila Romaissa Beddiar,
  • Mourad Oussalah,
  • Tapio Seppänen and
  • Rachid Jennane

1 November 2022

Medical image captioning is a very challenging task that has been rarely addressed in the literature on natural image captioning. Some existing image captioning techniques exploit objects present in the image next to the visual features while generat...

  • Article
  • Open Access
158 Views
29 Pages

16 January 2026

Scientific studies have demonstrated how certain insect species can be used as bioindicators and reverse environmental degradation through their behavior and organization. Studying these species involves capturing and extracting hundreds of insects f...

  • Article
  • Open Access
2,297 Views
21 Pages

21 March 2024

Image captioning, also recognized as the challenge of transforming visual data into coherent natural language descriptions, has persisted as a complex problem. Traditional approaches often suffer from semantic gaps, wherein the generated textual desc...

  • Article
  • Open Access
2 Citations
2,605 Views
25 Pages

Expanding Open-Vocabulary Understanding for UAV Aerial Imagery: A Vision–Language Framework to Semantic Segmentation

  • Bangju Huang,
  • Junhui Li,
  • Wuyang Luan,
  • Jintao Tan,
  • Chenglong Li and
  • Longyang Huang

19 February 2025

The open-vocabulary understanding of UAV aerial images plays a crucial role in enhancing the intelligence level of remote sensing applications, such as disaster assessment, precision agriculture, and urban planning. In this paper, we propose an innov...

  • Article
  • Open Access
4 Citations
5,665 Views
17 Pages

12 November 2018

Image caption generation is a fundamental task to build a bridge between image and its description in text, which is drawing increasing interest in artificial intelligence. Images and textual sentences are viewed as two different carriers of informat...

  • Article
  • Open Access
6 Citations
3,190 Views
19 Pages

Positioning information has become one of the most important information for processing and displaying on smart mobile devices. In this paper, we propose a visual positioning method using RGB-D image on smart mobile devices. Firstly, the pose of each...

  • Article
  • Open Access
2 Citations
1,987 Views
14 Pages

30 June 2025

Crop diseases pose a significant threat to agricultural productivity and global food security. Timely and accurate disease identification is crucial for improving crop yield and quality. While most existing deep learning-based methods focus primarily...

  • Article
  • Open Access
7 Citations
3,732 Views
18 Pages

16 September 2023

With the development of deep learning, image synthesis has achieved unprecedented achievements in the past few years. Image synthesis models, represented by diffusion models, demonstrated stable and high-fidelity image generation. However, the tradit...

  • Article
  • Open Access
1 Citations
2,445 Views
13 Pages

13 June 2023

Although image recognition technologies are developing rapidly with deep learning, conventional recognition models trained by supervised learning with class labels do not work well when test inputs from untrained classes are given. For example, a rec...

  • Article
  • Open Access
8 Citations
4,414 Views
20 Pages

Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network

  • Shima Javanmardi,
  • Ali Mohammad Latif,
  • Mohammad Taghi Sadeghi,
  • Mehrdad Jahanbanifard,
  • Marcello Bonsangue and
  • Fons J. Verbeek

1 November 2022

In image captioning models, the main challenge in describing an image is identifying all the objects by precisely considering the relationships between the objects and producing various captions. Over the past few years, many methods have been propos...

  • Perspective
  • Open Access
20 Citations
5,440 Views
13 Pages

13 February 2022

Transformer-based approaches have shown good results in image captioning tasks. However, current approaches have a limitation in generating text from global features of an entire image. Therefore, we propose novel methods for generating better image...

  • Article
  • Open Access
8 Citations
2,669 Views
20 Pages

Veg-DenseCap: Dense Captioning Model for Vegetable Leaf Disease Images

  • Wei Sun,
  • Chunshan Wang,
  • Jingqiu Gu,
  • Xiang Sun,
  • Jiuxi Li and
  • Fangfang Liang

25 June 2023

The plant disease recognition model based on deep learning has shown good performance potential. However, high complexity and nonlinearity lead to the low transparency and poor interpretability of such models. These limitations greatly limit the depl...

  • Article
  • Open Access
46 Views
20 Pages

Dermatology “AI Babylon”: Cross-Language Evaluation of AI-Crafted Dermatology Descriptions

  • Emmanouil Karampinis,
  • Christina-Marina Zoumpourli,
  • Christina Kontogianni,
  • Theofanis Arkoumanis,
  • Dimitra Koumaki,
  • Dimitrios Mantzaris,
  • Konstantinos Filippakis,
  • Maria-Myrto Papadopoulou,
  • Melpomeni Theofili and
  • Dimitrios Sgouros
  • + 4 authors

22 January 2026

Background and Objectives: Dermatology relies on a complex terminology encompassing lesion types, distribution patterns, colors, and specialized sites such as hair and nails, while dermoscopy adds an additional descriptive framework, making interpret...

  • Article
  • Open Access
39 Citations
10,871 Views
12 Pages

19 April 2023

Generative adversarial networks (GANs) have demonstrated remarkable potential in the realm of text-to-image synthesis. Nevertheless, conventional GANs employing conditional latent space interpolation and manifold interpolation (GAN-CLS-INT) encounter...

  • Article
  • Open Access
6 Citations
6,189 Views
15 Pages

14 February 2023

Image captioning is a problem of viewing images and describing images in language. This is an important problem that can be solved by understanding the image, and combining two fields of image processing and natural language processing into one. The...

  • Article
  • Open Access
1 Citations
2,774 Views
16 Pages

Zero-Shot Image Caption Inference System Based on Pretrained Models

  • Xiaochen Zhang,
  • Jiayi Shen,
  • Yuyan Wang,
  • Jiacong Xiao and
  • Jin Li

28 September 2024

Recently, zero-shot image captioning (ZSIC) has gained significant attention, given its potential to describe unseen objects in images. This is important for real-world applications such as human–computer interaction, intelligent education, and...

  • Review
  • Open Access
21 Citations
5,820 Views
12 Pages

Technetium Complexes and Radiopharmaceuticals with Scorpionate Ligands

  • Petra Martini,
  • Micol Pasquali,
  • Alessandra Boschi,
  • Licia Uccelli,
  • Melchiore Giganti and
  • Adriano Duatti

15 August 2018

Scorpionate ligands have played a crucial role in the development of technetium chemistry and, recently, they have also fueled important advancements in the discovery of novel diagnostic imaging agents based on the γ-emitting radionuclide techn...

  • Article
  • Open Access
4 Citations
2,887 Views
14 Pages

30 November 2021

Image captioning generates written descriptions of an image. In recent image captioning research, attention regions seldom cover all objects, and generated captions may lack the details of objects and may remain far from reality. In this paper, we pr...

  • Article
  • Open Access
1 Citations
4,667 Views
18 Pages

IAACLIP: Image Aesthetics Assessment via CLIP

  • Zhuo Li,
  • Xingao Yan,
  • Xuebin Wei and
  • Feng Shao

Aesthetics primarily focuses on the study of art, encompassing the aesthetic categories of beauty and ugliness, as well as human aesthetic activities. Image Aesthetics Assessment (IAA) seeks to automatically evaluate the aesthetic quality of images b...

  • Article
  • Open Access
1 Citations
1,744 Views
27 Pages

18 August 2025

Foreign Object Debris (FOD) on airport pavements poses a serious threat to aviation safety, making accurate detection and interpretable scene understanding crucial for operational risk management. This paper presents an integrated multi-modal framewo...

  • Article
  • Open Access
2 Citations
3,469 Views
20 Pages

22 June 2025

This paper addresses the challenges of sample scarcity and class imbalance in remote sensing image semantic segmentation by proposing a decoupled synthetic sample generation framework based on a latent diffusion model. The method consists of two stag...

  • Article
  • Open Access
573 Views
10 Pages

30 August 2025

A polynomial is called a generalized multilinear polynomial if it is a sum of some multilinear polynomials over a field. The goal of this paper is to give a description of the images of generalized multilinear polynomials on upper triangular matrix a...

  • Article
  • Open Access
16 Citations
3,607 Views
20 Pages

Modeling of Hyperparameter Tuned Deep Learning Model for Automated Image Captioning

  • Mohamed Omri,
  • Sayed Abdel-Khalek,
  • Eied M. Khalil,
  • Jamel Bouslimi and
  • Gyanendra Prasad Joshi

18 January 2022

Image processing remains a hot research topic among research communities due to its applicability in several areas. An important application of image processing is the automatic image captioning technique, which intends to generate a proper descripti...

  • Article
  • Open Access
1 Citations
3,403 Views
24 Pages

This article addresses the impact of generative artificial intelligence on the creation of composite sketches for police investigations. The automation of this task, traditionally performed through artistic methods or image composition, has become a...

  • Article
  • Open Access
12 Citations
3,664 Views
18 Pages

23 January 2024

Image captioning is a technique that enables the automatic extraction of natural language descriptions about the contents of an image. On the one hand, information in the form of natural language can enhance accessibility by reducing the expertise re...

  • Article
  • Open Access
5 Citations
2,863 Views
17 Pages

RI-MFM: A Novel Infrared and Visible Image Registration with Rotation Invariance and Multilevel Feature Matching

  • Depeng Zhu,
  • Weida Zhan,
  • Jingqi Fu,
  • Yichun Jiang,
  • Xiaoyu Xu,
  • Renzhong Guo and
  • Yu Chen

10 September 2022

In the past ten years, multimodal image registration technology has been continuously developed, and a large number of researchers have paid attention to the problem of infrared and visible image registration. Due to the differences in grayscale dist...

  • Article
  • Open Access
4 Citations
3,680 Views
28 Pages

Level of Agreement between Emotions Generated by Artificial Intelligence and Human Evaluation: A Methodological Proposal

  • Miguel Carrasco,
  • César González-Martín,
  • Sonia Navajas-Torrente and
  • Raúl Dastres

12 October 2024

Images are capable of conveying emotions, but emotional experience is highly subjective. Advances in artificial intelligence have enabled the generation of images based on emotional descriptions. However, the level of agreement between the generative...

  • Article
  • Open Access
1 Citations
950 Views
17 Pages

The environmental perception capability of intelligent ships is essential for enhancing maritime navigation safety and advancing shipping intelligence. Image caption generation technology plays a pivotal role in this context by converting visual info...

  • Article
  • Open Access
3 Citations
3,719 Views
21 Pages

aRTIC GAN: A Recursive Text-Image-Conditioned GAN

  • Edoardo Alati,
  • Carlo Alberto Caracciolo,
  • Marco Costa,
  • Marta Sanzari ,
  • Paolo Russo and
  • Irene Amerini

Generative Adversarial Networks have recently demonstrated the capability to synthesize photo-realistic real-world images. However, they still struggle to offer high controllability of the output image, even if several constraints are provided as inp...

  • Communication
  • Open Access
8 Citations
7,471 Views
12 Pages

26 December 2022

Image generation from natural language has become a very promising area of research on multimodal learning in recent years. In recent years, the performance of this theme has improved rapidly, and the release of powerful tools has caused a great resp...

of 12