Submit to Special Issue Submit Abstract to Special Issue Review for Information Propose a Special Issue

Journal Menu

Journal Browser

AI-Based Image Processing and Computer Vision

Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Applications".

Deadline for manuscript submissions: 20 December 2025 | Viewed by 24265

Share This Special Issue

Special Issue Editor

Prof. Dr. Kohei Arai

E-Mail Website
Guest Editor

Science and Engineering Faculty, Saga University, Saga City 840-8502, Japan
Interests: artificial intelligence; big data analysis; computer vision; human-computer interaction; modeling and simulation; satellite remote sensing; image processing and analysis

Special Issue Information

Dear Colleagues,

In recent years, AI-based image processing and computer vision have made remarkable progress, and they are being put into practical use across various fields. Furthermore, technological advances in deep learning have significantly improved the accuracy of image recognition. It is now possible to perform tasks with high precision, such as human face recognition and object detection, which were difficult with conventional image recognition technology. Technological advances in generative models (such as GAN) have also made it possible to generate high-quality images that look as if they had been created by humans. This technology is used in fields such as image editing and advertising production. Moreover, image restoration technology removes noise from images and complements missing parts, and such technology that utilizes AI has been developed, thereby making it possible to restore more natural and high-quality images. Moreover, 3D recognition technology using 3D sensors and deep learning has progressed rapidly, and it is utilized in fields such as robotics and autonomous driving. Video analysis technology recognizes the movement of people and objects from videos, analyzes their actions, and is used in surveillance cameras and security systems. Meanwhile, augmented reality (AR) and virtual reality (VR), when combined with AI technology, can provide a more realistic and immersive experience. This technology is used in fields such as entertainment and education. In addition, AI technology is expected to generate new innovations by merging with other technologies such as robotics, autonomous driving, and medical care. However, ethical issues have also arisen with the development of AI. For example, if AI is misused, problems such as privacy invasion and discrimination may occur. It is important to respond appropriately to these issues. Accordingly, the following research areas are selected for this Special Issue: AI-based image processing and computer vision, pattern analysis, machine intelligence, pattern recognition, and image understanding. Your contributions would be highly appreciated.

Prof. Dr. Kohei Arai
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

image processing
computer vision
pattern analysis
machine intelligence
pattern recognition
image understanding

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (10 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

Jump to: Review

19 pages, 1415 KB

Open AccessArticle

LFRE-YOLO: Lightweight Edge Computing Algorithm for Detecting External-Damage Objects on Transmission Lines

by Min Liu, Benhui Wu and Ming Chen

Information 2025, 16(12), 1035; https://doi.org/10.3390/info16121035 - 27 Nov 2025

Viewed by 324

Abstract

Transmission lines in complex outdoor environments often suffer external damage in construction areas, severely affecting the stability of power systems. Traditional manual detection methods have problems of low efficiency and poor real-time performance. In deep learning-based detection methods, standard convolution has a large parameter count and computational complexity, making it difficult to deploy on edge devices; while lightweight depthwise separable convolution offers low computational cost, it suffers from insufficient feature extraction capability. This limitation stems from its independent processing of each channel’s information, making it unable to simultaneously meet the practical requirements for both lightweight design and high detection accuracy in transmission line monitoring applications. To address the above problems, this study proposes LFRE-YOLO, a lightweight external damage detection algorithm for transmission lines based on YOLOv10n. This study proposes LFRE-YOLO, a lightweight external damage detection algorithm based on YOLOv10n. First, we design a lightweight feature reuse and enhancement convolution (LFREConv) that overcomes the limitations of traditional depthwise separable convolution through cascaded dual depthwise convolution structure and residual connection mechanisms, significantly expanding the effective receptive field with minimal parameter increment and compensating for information loss caused by independent channel processing in depthwise convolution through feature reuse strategies. Second, based on LFREConv, we propose an efficient lightweight feature extraction module (LFREBlock) that achieves cross-channel information interaction enhancement and channel importance modeling. Additionally, we propose a lightweight feature reuse and enhancement detection head (LFRE-Head) that applies LFREConv to the regression branch, achieving comprehensive lightweight design of the detection head while maintaining spatial localization accuracy. Finally, we employ layer-adaptive magnitude-based pruning (LAMP) to prune the trained model, further optimizing the network structure through layer-wise adaptive pruning. Experimental results demonstrate significant improvements over YOLOv10n baseline: mAP50 increased from 92.0% to 94.1%, mAP50-95 improved from 66.2% to 70.2%, while reducing parameters from 2.27 M to 0.99 M, computational complexity from 6.5 G to 3.1 G, and achieving 86.9 FPS inference speed, making it suitable for resource-constrained edge computing environments. Full article

(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)

► Show Figures

Graphical abstract

12 pages, 4847 KB

Open AccessArticle

Surformer v1: Transformer-Based Surface Classification Using Tactile and Vision Features

by Manish Kansana, Elias Hossain, Shahram Rahimi and Noorbakhsh Amiri Golilarz

Information 2025, 16(10), 839; https://doi.org/10.3390/info16100839 - 27 Sep 2025

Viewed by 682

Abstract

Surface material recognition is a key component in robotic perception and physical interaction, particularly when leveraging both tactile and visual sensory inputs. In this work, we propose Surformer v1, a transformer-based architecture designed for surface classification using structured tactile features and Principal Component Analysis (PCA)-reduced visual embeddings extracted via ResNet 50. The model integrates modality-specific encoders with cross-modal attention layers, enabling rich interactions between vision and touch. Currently, state-of-the-art deep learning models for vision tasks have achieved remarkable performance. With this in mind, our first set of experiments focused exclusively on tactile-only surface classification. Using feature engineering, we trained and evaluated multiple machine learning models, assessing their accuracy and inference time. We then implemented an encoder-only Transformer model tailored for tactile features. This model not only achieves the highest accuracy, but also demonstrated significantly faster inference time compared to other evaluated models, highlighting its potential for real-time applications. To extend this investigation, we introduced a multimodal fusion setup by combining vision and tactile inputs. We trained both Surformer v1 (using structured features) and a Multimodal CNN (using raw images) to examine the impact of feature-based versus image-based multimodal learning on classification accuracy and computational efficiency. The results showed that Surformer v1 achieved 99.4% accuracy with an inference time of 0.7271 ms, while the Multimodal CNN achieved slightly higher accuracy but required significantly more inference time. These findings suggest that Surformer v1 offers a compelling balance between accuracy, efficiency, and computational cost for surface material recognition. The results also underscore the effectiveness of integrating feature learning, cross-modal attention and transformer-based fusion in capturing the complementary strengths of tactile and visual modalities. Full article

(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

19 pages, 2646 KB

Open AccessArticle

A Comprehensive Study of MCS-TCL: Multi-Functional Sampling for Trustworthy Compressive Learning

by Fuma Kimishima, Jian Yang and Jinjia Zhou

Information 2025, 16(9), 777; https://doi.org/10.3390/info16090777 - 7 Sep 2025

Viewed by 541

Abstract

Compressive Learning (CL) is an emerging paradigm that allows machine learning models to perform inference directly from compressed measurements, significantly reducing sensing and computational costs. While existing CL approaches have achieved competitive accuracy compared to traditional image-domain methods, they typically rely on reconstruction to address information loss and often neglect uncertainty arising from ambiguous or insufficient data. In this work, we propose MCS-TCL, a novel and trustworthy CL framework based on Multi-functional Compressive Sensing Sampling. Our approach unifies sampling, compression, and feature extraction into a single operation by leveraging the compatibility between compressive sensing and convolutional feature learning. This joint design enables efficient signal acquisition while preserving discriminative information, leading to feature representations that remain robust across varying sampling ratios. To enhance the model’s reliability, we incorporate evidential deep learning (EDL) during training. EDL estimates the distribution of evidence over output classes, enabling the model to quantify predictive uncertainty and assign higher confidence to well-supported predictions. Extensive experiments on image classification tasks show that MCS-TCL outperforms existing CL methods, achieving state-of-the-art accuracy at a low sampling rate of 6%. Additionally, our framework reduces model size by 85.76% while providing meaningful uncertainty estimates, demonstrating its effectiveness in resource-constrained learning scenarios. Full article

(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

15 pages, 2123 KB

Open AccessArticle

Multi-Class Visual Cyberbullying Detection Using Deep Neural Networks and the CVID Dataset

by Muhammad Asad Arshed, Zunera Samreen, Arslan Ahmad, Laiba Amjad, Hasnain Muavia, Christine Dewi and Muhammad Kabir

Information 2025, 16(8), 630; https://doi.org/10.3390/info16080630 - 24 Jul 2025

Viewed by 3006

Abstract

In an era where online interactions increasingly shape social dynamics, the pervasive issue of cyberbullying poses a significant threat to the well-being of individuals, particularly among vulnerable groups. Despite extensive research on text-based cyberbullying detection, the rise of visual content on social media platforms necessitates new approaches to address cyberbullying using images. This domain has been largely overlooked. In this paper, we present a novel dataset specifically designed for the detection of visual cyberbullying, encompassing four distinct classes: abuse, curse, discourage, and threat. The initial prepared dataset (cyberbullying visual indicators dataset (CVID)) comprised 664 samples for training and validation, expanded through data augmentation techniques to ensure balanced and accurate results across all classes. We analyzed this dataset using several advanced deep learning models, including VGG16, VGG19, MobileNetV2, and Vision Transformer. The proposed model, based on DenseNet201, achieved the highest test accuracy of 99%, demonstrating its efficacy in identifying the visual cues associated with cyberbullying. To prove the proposed model’s generalizability, the 5-fold stratified K-fold was also considered, and the model achieved an average test accuracy of 99%. This work introduces a dataset and highlights the potential of leveraging deep learning models to address the multifaceted challenges of detecting cyberbullying in visual content. Full article

(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

24 pages, 10115 KB

Open AccessArticle

iSight: A Smart Clothing Management System to Empower Blind and Visually Impaired Individuals

by Daniel Rocha, Celina P. Leão, Filomena Soares and Vítor Carvalho

Information 2025, 16(5), 383; https://doi.org/10.3390/info16050383 - 3 May 2025

Viewed by 2432

Abstract

Clothing management is a major challenge for blind and visually impaired individuals to perform independently. This research developed and validated the iSight, a mechatronic smart wardrobe prototype, integrating computer vision and artificial intelligence to identify clothing types, colours, and alterations. Tested with 15 participants, iSight achieved high user satisfaction, with 60% rating it as very accurate in clothing identification, 80% in colour detection, and 86.7% in near-field communication tag recognition. Statistical analyses confirmed its positive impact on confidence, independence, and well-being. Despite the fact that improvements in menu complexity and fabric information were suggested, iSight proves to be a robust, user-friendly assistive tool with the potential to enhance the daily living of blind and visually impaired individuals. Full article

(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

15 pages, 5939 KB

Open AccessArticle

Center-Guided Network with Dynamic Attention for Transmission Tower Detection

by Xiaobin Li, Zhuwei Liang, Jingbin Yang, Chuanlong Lyu and Yuge Xu

Information 2025, 16(4), 331; https://doi.org/10.3390/info16040331 - 21 Apr 2025

Viewed by 687

Abstract

Transmission tower detection in aerial images is the critical step for the inspection of power transmission equipment, which is essential for the stable operation of the power system. However, transmission towers in aerial images pose numerous challenges for object detection due to their multi-scale elongated shapes, large aspect ratios, and visually similar backgrounds. To address these problems, we propose the Center-Guided network with Dynamic Attention (CGDA) for detecting TTs from aerial images. Specifically, we apply ResNet and FPN as the feature extractor to extract high-quality and multi-scale features. To obtain more discriminative information, the dynamic attention mechanism is employed to dynamically fuse multi-scale feature maps and place more attention on the object regions. In addition, a two-stage detection head is proposed to employ a two-stage detection process to perform more accurate detection. Extensive experiments are conducted on a subset of the public TTPLA dataset. The results show that CGDA achieves competitive performance in detecting TTs, demonstrating the effectiveness of the proposed approach. Full article

(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

21 pages, 5371 KB

Open AccessArticle

From Pixels to Diagnosis: Implementing and Evaluating a CNN Model for Tomato Leaf Disease Detection

by Zamir Osmenaj, Evgenia-Maria Tseliki, Sofia H. Kapellaki, George Tselikis and Nikolaos D. Tselikas

Information 2025, 16(3), 231; https://doi.org/10.3390/info16030231 - 16 Mar 2025

Cited by 4 | Viewed by 3659

Abstract

The frequent emergence of multiple diseases in tomato plants poses a significant challenge to agriculture, requiring innovative solutions to deal with this problem. The paper explores the application of machine learning (ML) technologies to develop a model capable of identifying and classifying diseases in tomato leaves. Our work involved the implementation of a custom convolutional neural network (CNN) trained on a diverse dataset of tomato leaf images. The performance of the proposed CNN model was evaluated and compared against the performance of existing pre-trained CNN models, i.e., the VGG16 and VGG19 models, which are extensively used for image classification tasks. The proposed CNN model was further tested with images of tomato leaves captured from a real-world garden setting in Greece. The captured images were carefully preprocessed and an in-depth study was conducted on how either each image preprocessing step or a different—not supported by the dataset used—strain of tomato affects the accuracy and confidence in detecting tomato leaf diseases. Full article

(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)

► Show Figures

Graphical abstract

15 pages, 3085 KB

Open AccessArticle

Early Detection of Skin Diseases Across Diverse Skin Tones Using Hybrid Machine Learning and Deep Learning Models

by Akasha Aquil, Faisal Saeed, Souad Baowidan, Abdullah Marish Ali and Nouh Sabri Elmitwally

Information 2025, 16(2), 152; https://doi.org/10.3390/info16020152 - 19 Feb 2025

Cited by 2 | Viewed by 5459

Abstract

Skin diseases in melanin-rich skin often present diagnostic challenges due to the unique characteristics of darker skin tones, which can lead to misdiagnosis or delayed treatment. This disparity impacts millions within diverse communities, highlighting the need for accurate, AI-based diagnostic tools. In this paper, we investigated the performance of three machine learning methods -Support Vector Machines (SVMs), Random Forest (RF), and Decision Trees (DTs)-combined with state-of-the-art (SOTA) deep learning models, EfficientNet, MobileNetV2, and DenseNet121, for predicting skin conditions using dermoscopic images from the HAM10000 dataset. The features were extracted using the deep learning models, with the labels encoded numerically. To address the data imbalance, SMOTE and resampling techniques were applied. Additionally, Principal Component Analysis (PCA) was used for feature reduction, and fine-tuning was performed to optimize the models. The results demonstrated that RF with DenseNet121 achieved a superior accuracy of 98.32%, followed by SVM with MobileNetV2 at 98.08%, and Decision Tree with MobileNetV2 at 85.39%. The proposed methods overcome the SVM with the SOTA EfficientNet model, validating the robustness of the proposed approaches. Evaluation metrics such as accuracy, precision, recall, and F1-score were used to benchmark performance, showcasing the potential of these methods in advancing skin disease diagnostics for diverse populations. Full article

(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

12 pages, 5319 KB

Open AccessArticle

A Method for Maintaining a Unique Kurume Kasuri Pattern of Woven Textile Classified by EfficientNet by Means of LightGBM-Based Prediction of Misalignments

by Kohei Arai, Jin Shimazoe and Mariko Oda

Information 2024, 15(8), 434; https://doi.org/10.3390/info15080434 - 26 Jul 2024

Cited by 1 | Viewed by 1357

Abstract

Methods for evaluating the fluctuation of texture patterns that are essentially regular have been proposed in the past, but the best method has not been determined. Here, as an attempt at this, we propose a method that applies AI technology (learning EfficientNet, which is widely used as a classification problem solving method) to determine when the fluctuation exceeds the tolerable limit and what the acceptable range is. We also apply this to clarify the tolerable limit of fluctuation in the “Kurume Kasuri” pattern, which is unique to the Chikugo region of Japan, and devise a method to evaluate the fluctuation in real time when weaving the Kasuri and keep it within the acceptable range. This study proposes a method for maintaining a unique faded pattern of woven textiles by utilizing EfficientNet for classification, fine-tuned with Optuna, and LightGBM for predicting subtle misalignments. Our experiments show that EfficientNet achieves high performance in classifying the quality of unique faded patterns in woven textiles. Additionally, LightGBM demonstrates near-perfect accuracy in predicting subtle misalignments within the acceptable range for high-quality faded patterns by controlling the weaving thread tension. Consequently, this method effectively maintains the quality of Kurume Kasuri patterns within the desired criteria. Full article

(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

Review

Jump to: Research

29 pages, 2763 KB

Open AccessReview

A Review of Computer Vision Technology for Football Videos

by Fucheng Zheng, Duaa Zuhair Al-Hamid, Peter Han Joo Chong, Cheng Yang and Xue Jun Li

Information 2025, 16(5), 355; https://doi.org/10.3390/info16050355 - 28 Apr 2025

Cited by 2 | Viewed by 4998

Abstract

In the era of digital advancement, the integration of Deep Learning (DL) algorithms is revolutionizing performance monitoring in football. Due to restrictions on monitoring devices during games to prevent unfair advantages, coaches are tasked to analyze players’ movements and performance visually. As a result, Computer Vision (CV) technology has emerged as a vital non-contact tool for performance analysis, offering numerous opportunities to enhance the clarity, accuracy, and intelligence of sports event observations. However, existing CV studies in football face critical challenges, including low-resolution imagery of distant players and balls, severe occlusion in crowded scenes, motion blur during rapid movements, and the lack of large-scale annotated datasets tailored for dynamic football scenarios. This review paper fills this gap by comprehensively analyzing advancements in CV, particularly in four key areas: player/ball detection and tracking, motion prediction, tactical analysis, and event detection in football. By exploring these areas, this review offers valuable insights for future research on using CV technology to improve sports performance. Future directions should prioritize super-resolution techniques to enhance video quality and improve small-object detection performance, collaborative efforts to build diverse and richly annotated datasets, and the integration of contextual game information (e.g., score differentials and time remaining) to improve predictive models. The in-depth analysis of current State-Of-The-Art (SOTA) CV techniques provides researchers with a detailed reference to further develop robust and intelligent CV systems in football. Full article

(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)

► Show Figures

Journal Menu

Journal Browser

AI-Based Image Processing and Computer Vision

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (10 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI