Applications of Artificial Intelligence in Image and Video Processing

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 1 November 2024 | Viewed by 2012

Special Issue Editors


E-Mail Website
Guest Editor
Department of Artificial Intelligence Convergence, Pukyong National University, 45, Yongso-ro, Nam-gu, Busan 48513, Republic of Korea
Interests: image/video processing; image/video compression; image/video watermarking; video communication; VLSI design for real-time video applications

E-Mail Website
Guest Editor
Department of Artificial Intelligence, Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, Republic of Korea
Interests: machine learning; deep learning; brain-computer interfaces; predictive medical intelligence; neuroimaging analysis

E-Mail Website
Guest Editor
Department of Computer and Artificial Intelligence Engineering, Pukyong National University, Busan 48513, Republic of Korea
Interests: computer graphics

Special Issue Information

Dear Colleagues,

Deep learning has demonstrated remarkable research achievements in the field of computer vision and image processing. In particular, deep learning has been widely utilized in various fields such as manufacturing, healthcare, and aerospace, employing analysis techniques for tasks such as object detection, image recognition, semantic segmentation, and video action recognition in image and video data. However, there is a need for further advancements and research to enhance these applications. Therefore, this Special Issue aims to facilitate the exchange of original and groundbreaking research among researchers and practitioners in this field. Furthermore, the goal is to cover the latest advancements, methodologies, and applications of deep learning in the field of image and video analysis. The following are the areas of interest for this Special Issue:

  • Deep learning for object detection and recognition;
  • Deep learning for image segmentation;
  • Deep learning for image classification;
  • Deep learning for video action recognition and activity detection;
  • Deep learning for real-time deep learning architectures;
  • Deep learning for super-resolution;
  • Deep learning for image enhancement;
  • Deep learning for inspection of some objects.

We look forward to receiving your contributions.

Prof. Dr. Jongnam Kim
Dr. Heung-Il Suk
Prof. Dr. Youngbong Kim
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image recognition and classification
  • image and video segmentation
  • object detection and tracking
  • image and video captioning
  • video analysis and action recognition
  • 3D image and video processing
  • real-time image and video processing
  • super-resolution
  • image enhancement
  • inspection of some objects
  • other things related with image processing

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 5104 KiB  
Article
Hierarchical Vector-Quantized Variational Autoencoder and Vector Credibility Mechanism for High-Quality Image Inpainting
by Cheng Li, Dan Xu and Kuai Chen
Electronics 2024, 13(10), 1852; https://doi.org/10.3390/electronics13101852 - 9 May 2024
Viewed by 180
Abstract
Image inpainting infers the missing areas of a corrupted image according to the information of the undamaged part. Many existing image inpainting methods can generate plausible inpainted results from damaged images with the fast-developed deep-learning technology. However, they still suffer from over-smoothed textures [...] Read more.
Image inpainting infers the missing areas of a corrupted image according to the information of the undamaged part. Many existing image inpainting methods can generate plausible inpainted results from damaged images with the fast-developed deep-learning technology. However, they still suffer from over-smoothed textures or textural distortion in the cases of complex textural details or large damaged areas. To restore textures at a fine-grained level, we propose an image inpainting method based on a hierarchical VQ-VAE with a vector credibility mechanism. It first trains the hierarchical VQ-VAE with ground truth images to update two codebooks and to obtain two corresponding vector collections containing information on ground truth images. The two vector collections are fed to a decoder to generate the corresponding high-fidelity outputs. An encoder then is trained with the corresponding damaged image. It generates vector collections approximating the ground truth by the help of the prior knowledge provided by the codebooks. After that, the two vector collections pass through the decoder from the hierarchical VQ-VAE to produce the inpainted results. In addition, we apply a vector credibility mechanism to promote vector collections from damaged images and approximate vector collections from ground truth images. To further improve the inpainting result, we apply a refinement network, which uses residual blocks with different dilation rates to acquire both global information and local textural details. Extensive experiments conducted on several datasets demonstrate that our method outperforms the state-of-the-art ones. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

21 pages, 1091 KiB  
Article
Polymorphic Clustering and Approximate Masking Framework for Fine-Grained Insect Image Classification
by Hua Huo, Aokun Mei and Ningya Xu
Electronics 2024, 13(9), 1691; https://doi.org/10.3390/electronics13091691 - 27 Apr 2024
Viewed by 427
Abstract
Insect diversity monitoring is crucial for biological pest control in agriculture and forestry. Modern monitoring of insect species relies heavily on fine-grained image classification models. Fine-grained image classification faces challenges such as small inter-class differences and large intra-class variances, which are even more [...] Read more.
Insect diversity monitoring is crucial for biological pest control in agriculture and forestry. Modern monitoring of insect species relies heavily on fine-grained image classification models. Fine-grained image classification faces challenges such as small inter-class differences and large intra-class variances, which are even more pronounced in insect scenes where insect species often exhibit significant morphological differences across multiple life stages. To address these challenges, we introduce segmentation and clustering operations into the image classification task and design a novel network model training framework for fine-grained classification of insect images using multi-modality clustering and approximate mask methods, named PCAM-Frame. In the first stage of the framework, we adopt the Polymorphic Clustering Module, and segmentation and clustering operations are employed to distinguish various morphologies of insects at different life stages, allowing the model to differentiate between samples at different life stages during training. The second stage consists of a feature extraction network, called Basenet, which can be any mainstream network that performs well in fine-grained image classification tasks, aiming to provide pre-classification confidence for the next stage. In the third stage, we apply the Approximate Masking Module to mask the common attention regions of the most likely classes and continuously adjust the convergence direction of the model during training using a Deviation Loss function. We apply PCAM-Frame with multiple classification networks as the Basenet in the second stage and conduct extensive experiments on the Insecta dataset of iNaturalist 2017 and IP102 dataset, achieving improvements of 2.2% and 1.4%, respectively. Generalization experiments on other fine-grained image classification datasets such as CUB200-2011 and Stanford Dogs also demonstrate positive effects. These experiments validate the pertinence and effectiveness of our framework PCAM-Frame in fine-grained image classification tasks under complex conditions, particularly in insect scenes. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

13 pages, 2776 KiB  
Article
Isolated Video-Based Sign Language Recognition Using a Hybrid CNN-LSTM Framework Based on Attention Mechanism
by Diksha Kumari and Radhey Shyam Anand
Electronics 2024, 13(7), 1229; https://doi.org/10.3390/electronics13071229 - 26 Mar 2024
Viewed by 460
Abstract
Sign language is a complex language that uses hand gestures, body movements, and facial expressions and is majorly used by the deaf community. Sign language recognition (SLR) is a popular research domain as it provides an efficient and reliable solution to bridge the [...] Read more.
Sign language is a complex language that uses hand gestures, body movements, and facial expressions and is majorly used by the deaf community. Sign language recognition (SLR) is a popular research domain as it provides an efficient and reliable solution to bridge the communication gap between people who are hard of hearing and those with good hearing. Recognizing isolated sign language words from video is a challenging research area in computer vision. This paper proposes a hybrid SLR framework that combines a convolutional neural network (CNN) and an attention-based long-short-term memory (LSTM) neural network. We used MobileNetV2 as a backbone model due to its lightweight structure, which reduces the complexity of the model architecture for deriving meaningful features from the video frame sequence. The spatial features are fed to LSTM optimized with an attention mechanism to select the significant gesture cues from the video frames and focus on salient features from the sequential data. The proposed method is evaluated on a benchmark WLASL dataset with 100 classes based on precision, recall, F1-score, and 5-fold cross-validation metrics. Our methodology acquired an average accuracy of 84.65%. The experiment results illustrate that our model performed effectively and computationally efficiently compared to other state-of-the-art methods. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

12 pages, 4529 KiB  
Article
Classification Method of 3D Pattern Film Images Using MLP Based on the Optimal Widths of Histogram
by Jaeeun Lee, Hongseok Choi and Jongnam Kim
Electronics 2024, 13(6), 1114; https://doi.org/10.3390/electronics13061114 - 18 Mar 2024
Viewed by 506
Abstract
3D pattern film is a film that makes a 2D pattern appear 3D depending on the amount and angle of light. However, since the 3D pattern film image was developed recently, there is no established method for classifying and verifying defective products, and [...] Read more.
3D pattern film is a film that makes a 2D pattern appear 3D depending on the amount and angle of light. However, since the 3D pattern film image was developed recently, there is no established method for classifying and verifying defective products, and there is little research in this area, making it a necessary field of study. Additionally, 3D pattern film has blurred contours, making it difficult to detect the outlines and challenging to classify. Recently, many machine learning methods have been published for analyzing product quality. However, when there is a small amount of data and most images are similar, using deep learning can easily lead to overfitting. To overcome these limitations, this study proposes a method that uses an MLP (Multilayer Perceptron) model to classify 3D pattern films into genuine and defective products. This approach entails inputting the widths derived from specific points’ heights in the image histogram of the 3D pattern film into the MLP, and then classifying the product as ‘good’ or ‘bad’ using optimal hyper-parameters found through the random search method. Although the contours of the 3D pattern film are blurred, this study can detect the characteristics of ‘good’ and ‘bad’ by using the image histogram. Moreover, the proposed method has the advantage of reducing the likelihood of overfitting and achieving high accuracy, as it reflects the characteristics of a limited number of similar images and builds a simple model. In the experiment, the accuracy of the proposed method was 98.809%, demonstrating superior performance compared to other models. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

Back to TopTop