Submit to Electronics Review for Electronics Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

AI/Machine Learning in Computer Vision/Image Processing and Natural Language Processing

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (15 May 2025) | Viewed by 9267

Share This Special Issue

Special Issue Editors

Dr. Ankit Gupta

E-Mail Website
Guest Editor

1. ITI/Larsys, Agência Regional para o Desenvolvimento da Investigação, Tecnologia e Inovação, Caminho da Penteada, Funchal-9020-125, Madeira, Portugal
2. Biomedical Engineering Group, Department of Cybernetics and Biomedical Engineering, Faculty of Electrical Engineering and Computer Science, VSB –Technical University of Ostrava,17. Istopadu 15, 708 00 Ostrava, Czech Republic
3. Department of Engineering and Exact Sciences, University of Madeira, Caminho da Penteada, 9020-125 Funchal, Madeira, Portugal
Interests: computer vision; deep learning; artificial intelligence

Dr. Morgado Dias

E-Mail Website
Guest Editor

Madeira Interactive Technologies Institute and ITI/Larsys, Universidade da Madeira, 9000-390 Funchal, Portugal
Interests: artificial neural networks; artificial intelligence; machine learning
Special Issues, Collections and Topics in MDPI journals

Dr. Antonio G. Ravelo-Garcia

E-Mail Website
Guest Editor

1. ITI/Larsys/Madeira Interactive Technologies Institute, 9020-105 Funchal, Portugal
2. Institute for Technological Development and Innovation in Communications, Universidad de Las Palmas de Gran Canaria, 35001 Las Palmas de Gran Canaria, Spain
Interests: data analysis; signal processing; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

Dr. Martin Černý

E-Mail Website
Guest Editor

Biomedical Engineering Group, Department of Cybernetics and Biomedical Engineering, Faculty of Electrical Engineering and Computer Science, VSB –Technical University of Ostrava,17. listopadu 15, 708 00 Ostrava, Czech Republic
Interests: contactless vital signs monitoring; assistive technologies; telemedicine; fuzzy logic
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the rapid pace of innovations and discoveries of generic and robust artificial intelligence/machine learning algorithms and methods, it is challenging to keep track of recent developments in various domains such as computer vision (CV) and natural language processing (NLP). Additionally, exploration of these algorithms is pivotal in investigating their robustness in various domains, such as healthcare, agriculture, remote sensing, marine diversity exploration, signal processing, vital signs monitoring, etc., which also deserve considerable attention. Therefore, with an effort to serve as a one-stop-shop for the AI community dedicated to developing robust AI/ML methods and algorithms in the domains of CV and NLP and applying these methods and algorithms for solving real-time problems of diverse domains, this Special Issue aims to provide a compilation of upcoming significant research developments in the field of AI and machine learning, covering the high- (pattern recognition and sophisticated reasoning) and low-level tasks (feature creation and extraction) encompassing both domains.

This Special Issue aims to cover all aspects of AI and ML and is not limited to the following topics:

Image captioning, object detection and recognition, and scene understanding;
Text summarization, machine translation, and sentiment analysis;
Large language/vision models and foundation models in the context of CV/NLP;
Causal and/or explainable artificial intelligence in the context of CV/NLP;
Edge detection and image or video denoising and deblurring;
Generative modelling and its applications in the context of CV/NLP;
Reinforcement learning and its applications in the context of CV/NLP;
Computer vision-based remote health monitoring, such as remote photoplethysmography or vital signs monitoring;
AI and machine learning methods for biomedical imaging;
NLP in healthcare.

We look forward to your contributions to this Special Issue.

Regards,

Dr. Ankit Gupta
Dr. Morgado Dias
Dr. Antonio G. Ravelo-Garcia
Dr. Martin Černý
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

computer vision
reinforcement learning
explainable AI
natural language processing
multimodal AI

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (8 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

32 pages, 2219 KiB

Open AccessArticle

A New Large Language Model for Attribute Extraction in E-Commerce Product Categorization

by Mehmet Serhan Çiftlikçi, Yusuf Çakmak, Tolga Ahmet Kalaycı, Fatih Abut, Mehmet Fatih Akay and Mehmet Kızıldağ

Electronics 2025, 14(10), 1930; https://doi.org/10.3390/electronics14101930 - 9 May 2025

Viewed by 2102

Abstract

In the rapidly evolving field of e-commerce, precise and efficient attribute extraction from product descriptions is crucial for enhancing search functionality, improving customer experience, and streamlining the listing process for sellers. This study proposes a large language model (LLM)-based approach for automated attribute extraction on Trendyol’s e-commerce platform. For comparison purposes, a deep learning (DL) model is also developed, leveraging a transformer-based architecture to efficiently identify explicit attributes. In contrast, the LLM, built on the Mistral architecture, demonstrates superior contextual understanding, enabling the extraction of both explicit and implicit attributes from unstructured text. The models are evaluated on an extensive dataset derived from Trendyol’s Turkish-language product catalog, using performance metrics such as precision, recall, and F1-score. Results indicate that the proposed LLM outperforms the DL model across most metrics, demonstrating superiority not only in direct single-model comparisons but also in average performance across all evaluated categories. This advantage is particularly evident in handling complex linguistic structures and diverse product descriptions. The system has been integrated into Trendyol’s platform with a scalable backend infrastructure, employing Kubernetes and Nvidia Triton Inference Server for efficient bulk processing and real-time attribute suggestions during the product listing process. This study not only advances attribute extraction for Turkish-language e-commerce but also provides a scalable and efficient NLP-based solution applicable to large-scale marketplaces. The findings offer critical insights into the trade-offs between accuracy and computational efficiency in large-scale multilingual NLP applications, contributing to the broader field of automated product classification and information retrieval in e-commerce ecosystems. Full article

(This article belongs to the Special Issue AI/Machine Learning in Computer Vision/Image Processing and Natural Language Processing)

► Show Figures

Figure 1

22 pages, 1959 KiB

Open AccessArticle

DMFormer: Dense Memory Linformer for Image Captioning

by Yuting He and Zetao Jiang

Electronics 2025, 14(9), 1716; https://doi.org/10.3390/electronics14091716 - 23 Apr 2025

Viewed by 481

Abstract

Image captioning is a cross-task of computer vision and natural language processing, aiming to describe image content in natural language. Existing methods still have deficiencies in modeling the spatial location and semantic correlation between image regions. Furthermore, these methods often exhibit insufficient interaction between image features and text features. To address these issues, we propose a Linformer-based image captioning method, the Dense Memory Linformer for Image Captioning (DMFormer), which has lower time and space complexity than the traditional Transformer architecture. The DMFormer contains two core modules: the Relation Memory Augmented Encoder (RMAE) and the Dense Memory Augmented Decoder (DMAD). In the RMAE, we propose Relation Memory Augmented Attention (RMAA), which combines explicit spatial perception and implicit spatial perception. It explicitly uses geometric information to model the geometric correlation between image regions and implicitly constructs memory unit matrices to learn the contextual information of image region features. In the DMAD, we introduce Dense Memory Augmented Cross Attention (DMACA). This module fully utilizes the low-level and high-level features generated by the RMAE through dense connections, and constructs memory units to store prior knowledge of image and text. It learns the cross-modal associations between visual and linguistic features through an adaptive gating mechanism. Experimental results on the MS-COCO dataset show that the descriptions generated by the DMFormer are richer and more accurate, with significant improvements in various evaluation metrics compared to mainstream methods. Full article

(This article belongs to the Special Issue AI/Machine Learning in Computer Vision/Image Processing and Natural Language Processing)

► Show Figures

Figure 1

19 pages, 1320 KiB

Open AccessArticle

SkinSavvy2: Augmented Skin Lesion Diagnosis and Personalized Medical Consultation System

by Hyungjoon Kim, Yunju Kim and Wonho Song

Electronics 2025, 14(5), 969; https://doi.org/10.3390/electronics14050969 - 28 Feb 2025

Viewed by 928

Abstract

The shortage of medical personnel and the busy lives of modern people have increased the desire for the self-diagnosis of diseases, and the latest large-scale language models and image recognition technologies have the potential to meet this demand. In particular, skin-related diseases are one of the areas where symptoms are visually distinguishable, making self-diagnosis and care possible. In this paper, we propose a system that classifies diseases through images of skin diseases and combines them with individual conditions such as age, skin type, and gender for self-diagnosis. First, we design the latest deep learning model-based skin disease classifier that can classify six types of skin diseases using the HAM10000 dataset and generate prompts by combining the personal information input. By utilizing the Generative Pre-trained Transformer (GPT) model, the system generates personalized care methods based on these prompts. We measured the accuracy of the classification model of the proposed system and validated the effectiveness of the proposed method through user evaluations. Full article

(This article belongs to the Special Issue AI/Machine Learning in Computer Vision/Image Processing and Natural Language Processing)

► Show Figures

Figure 1

20 pages, 12240 KiB

Open AccessArticle

Character Can Speak Directly: An End-to-End Character Region Excavation Network for Scene Text Spotting

by Yan Li, Yan Shu, Binyang Li and Ruifeng Xu

Electronics 2025, 14(5), 851; https://doi.org/10.3390/electronics14050851 - 21 Feb 2025

Viewed by 763

Abstract

End-to-end scene text spotting methods have garnered significant research attention due to their promising results. However, most existing approaches are not well suited for real-world applications because of their inherently complex pipelines. In this paper, we propose an end-to-end Character Region Excavation Network (CRENet) to streamline the text spotting pipeline. Our contributions are threefold: (i) Pipeline simplification: For the first time, we eliminate the text region retrieval step, allowing characters to be directly spotted from scene images. (ii) ROA layer: We introduce a novel RoI (Region of Interest) feature sampling layer for multi-oriented character region feature sampling, significantly enhancing the recognizer’s performance. (iii) Progressive learning strategy: We propose a progressive learning strategy to gradually bridge the gap between synthetic data and real-world images, addressing the challenge posed by the high cost of character-level annotations required during training. Extensive experiments demonstrate that our proposed method is robust and effective across horizontal, oriented, and curved text, achieving results comparable to state-of-the-art methods on ICDAR 2013, ICDAR 2015, Total-Text and ReCTS. Full article

(This article belongs to the Special Issue AI/Machine Learning in Computer Vision/Image Processing and Natural Language Processing)

► Show Figures

Figure 1

21 pages, 4068 KiB

Open AccessArticle

Three-Dimensional Mesh Character Pose Transfer with Neural Sparse-Softmax Skinning Blending

by Siqi Liu, Mengxiao Yin, Ming Li, Feng Zhan and Bei Hua

Electronics 2025, 14(3), 589; https://doi.org/10.3390/electronics14030589 - 1 Feb 2025

Viewed by 1557

Abstract

Three-dimensional mesh pose transfer transforms the pose of a source model into the pose of a reference model while preserving the source model’s identity (body detail). It has tremendous potential in computer graphics tasks. Current neural network-based methods primarily focus on extracting pose and body features, not entirely using the articulated body structure of humans and animals. We propose an end-to-end pose transfer network based on skinning deformation to address these issues. This network first extracts skinning weights and model joint features. Then, they are decoded to transfer the source model to a pose similar to the reference model while preserving the features of the source model. During feature extraction, we utilize the features of the k-nearest neighborhoods and one-ring neighborhoods to enable the network to learn the body details of the model better. Additionally, we apply skinning weights and joint features to capture the variation in the source model pose compared to the reference model pose and then use a decoding network to obtain the target model, replacing linear blend skinning. We conducted experiments on datasets such as SMPL, SMAL, FAUST, DYNA, and the MG dataset to provide empirical evidence and demonstrate that our method is the best in quantitative experiments. Our method efficiently transfers poses while better preserving the identity of the source model. Full article

(This article belongs to the Special Issue AI/Machine Learning in Computer Vision/Image Processing and Natural Language Processing)

► Show Figures

Figure 1

17 pages, 16060 KiB

Open AccessArticle

Channel-Wise Attention-Enhanced Feature Mutual Reconstruction for Few-Shot Fine-Grained Image Classification

by Qianying Ou and Jinmiao Zou

Electronics 2025, 14(2), 377; https://doi.org/10.3390/electronics14020377 - 19 Jan 2025

Cited by 1 | Viewed by 1006

Abstract

Fine-grained image classification is faced with the challenge of significant intra-class differences and subtle similarities between classes, with a limited number of labelled data. Previous few-shot learning approaches, however, often fail to recognize these discriminative details, such as a bird’s eyes and beak. In this paper, we proposed a channel-wise attention-enhanced feature mutual reconstruction mechanism that helps to alleviate these problems for fine-grained image classification. This mechanism first employed a channel-wise attention module (CAM) to learn the channel weights for both the support and query features. We utilized channel-wise self-attention to assign greater importance to object-relevant channels. This helps the model to focus on subtle yet discriminative details, which is essential to the classification process. Then, we introduce a feature mutual reconstruction module (FMRM) to reconstruct features. The support features are reconstructed by a support-weight-enhanced feature map to reduce the intra-class variations, and query features are reconstructed by a query-weight-enhanced feature map to increase inter-class variations. The results of classification depend on the similarity between reconstructed features and enhanced features. We evaluated the performance based on four fine-grained image datasets when Conv-4 and Resnet-12 were used. The experimental results showed that our method outperforms previous few-shot fine-grained classification methods. This proves that our method can improve fine-grained image classification performance and simultaneously balance both the inter-class and intra-class variations. Full article

(This article belongs to the Special Issue AI/Machine Learning in Computer Vision/Image Processing and Natural Language Processing)

► Show Figures

Figure 1

25 pages, 3210 KiB

Open AccessArticle

In-Depth Collaboratively Supervised Video Instance Segmentation

by Yunnan Deng, Yinhui Zhang and Zifen He

Electronics 2025, 14(2), 363; https://doi.org/10.3390/electronics14020363 - 17 Jan 2025

Viewed by 820

Abstract

Video instance segmentation (VIS) is plagued by the high cost of pixel-level annotation and defects of weakly supervised segmentation, leading to the urgent need for a trade-off between annotation cost and performance. We propose a novel In-Depth Collaboratively Supervised video instance segmentation (IDCS) with efficient training. A collaborative supervised training pipeline is designed to flow samples of different labeling levels and carry out multimodal training, in which instance clues are obtained from mask-annotated instances to guide the box-annotated training through an in-depth collaborative paradigm: (1) a trident learning method is proposed, which leverages the video temporal consistency to match instances with multimodal annotation across frames for effective instance relation learning without additional network parameters; (2) spatial clues in the first frames are captured to implement multidimensional pixel affinity evaluation of box-annotated instances and augment the noise-disturbed spatial affinity map. Experiments on YoutTube-VIS validate the performance of IDCS with mask-annotated instances in the first frames and the bounding-box-annotated samples in the remaining frames. IDCS achieves up to 92.0% fully supervised performance and average 1.4 times faster, 2.2% mAP higher than the weakly supervised baseline. The results show that IDCS can efficiently utilize multimodal data, while providing advanced guidance for effective trade-off in VIS training. Full article

(This article belongs to the Special Issue AI/Machine Learning in Computer Vision/Image Processing and Natural Language Processing)

► Show Figures

Figure 1

19 pages, 2167 KiB

Open AccessArticle

Robust Bi-Orthogonal Projection Learning: An Enhanced Dimensionality Reduction Method and Its Application in Unsupervised Learning

by Xianhao Qin, Chunsheng Li, Yingyi Liang, Huilin Zheng, Luxi Dong, Yarong Liu and Xiaolan Xie

Electronics 2024, 13(24), 4944; https://doi.org/10.3390/electronics13244944 - 15 Dec 2024

Viewed by 845

Abstract

This paper introduces a robust bi-orthogonal projection (RBOP) learning method for dimensionality reduction (DR). The proposed RBOP enhances the flexibility, robustness, and sparsity of the embedding framework, extending beyond traditional DR methods such as principal component analysis (PCA), neighborhood preserving embedding (NPE), and locality preserving projection (LPP). Unlike conventional approaches that rely on a single type of projection, RBOP innovates by employing two types of projections: the “true” projection and the “counterfeit” projection. These projections are crafted to be orthogonal, offering enhanced flexibility for the “true” projection and facilitating more precise data transformation in the process of subspace learning. By utilizing sparse reconstruction, the acquired true projection has the capability to map the data into a low-dimensional subspace while efficiently maintaining sparsity. Observing that the two projections share many similar data structures, the method aims to maintain the similarity structure of the data through distinct reconstruction processes. Additionally, the incorporation of a sparse component allows the method to address noise-corrupted data, compensating for noise during the DR process. Within this framework, a number of new unsupervised DR techniques have been developed, such as RBOP_PCA, RBOP_NPE, and RBO_LPP. Experimental results from both natural and synthetic datasets indicate that these proposed methods surpass existing, well-established DR techniques. Full article

(This article belongs to the Special Issue AI/Machine Learning in Computer Vision/Image Processing and Natural Language Processing)

► Show Figures

Journal Menu

Journal Browser

AI/Machine Learning in Computer Vision/Image Processing and Natural Language Processing

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI