Special Issue "AI-Based Image Processing: 2nd Edition"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 30 April 2024 | Viewed by 4144

Special Issue Editors

Dr. Jianping Luo
E-Mail Website
Guest Editor
Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518118, China
Interests: multi-objective evolutionary algorithm; evolutionary multiobjective optimization; pareto front
Department of Electronic and Electrical Engineering, Brunel University London, London UB8 3PH, UK
Interests: artificial Intelligence; machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the rapid development of artificial intelligence, a series of artificial intelligence algorithms have been proposed, and intelligent applications have started to play an increasingly important role in industrial production and our social lives. The intersections of AI in medical image processing, image processing for intelligent transportation systems, satellite image processing, face recognition, object recognition, etc. are still challenging.

Therefore, a Special Issue on “AI-based Image Processing: 2nd Edition”, focusing on tackling the most pressing problems faced by the world is organized, and more specifically, in the topics specified below:

  • Image processing algorithms;
  • Image analytics;
  • Medial image processing;
  • Biomedical image analysis;
  • Image generation;
  • Image restoration and enhancement;
  • Image compression;
  • Edge detection;
  • Image segmentation;
  • Semantic segmentation;
  • Image classification;
  • Image inpainting;
  • Image captioning;
  • Feature detection and extraction;
  • Content-based image retrieval;
  • Optical character recognition;
  • Face recognition;
  • Emotion recognition;
  • Gesture recognition;
  • Object recognition and tracking.

Dr. Jianping Luo
Dr. Hongying Meng
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2300 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • feature detection and extraction
  • object recognition and tracking

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 24378 KiB  
Article
Diffusion-Denoising Process with Gated U-Net for High-Quality Document Binarization
Appl. Sci. 2023, 13(20), 11141; https://doi.org/10.3390/app132011141 - 10 Oct 2023
Viewed by 640
Abstract
The binarization of degraded documents represents a crucial preprocessing task for various document analyses, including optical character recognition and historical document analysis. Various convolutional neural network models and generative models have been used for document binarization. However, these models often struggle to deliver [...] Read more.
The binarization of degraded documents represents a crucial preprocessing task for various document analyses, including optical character recognition and historical document analysis. Various convolutional neural network models and generative models have been used for document binarization. However, these models often struggle to deliver generalized performance on noise types the model has not encountered during training and may have difficulty extracting intricate text strokes. We herein propose a novel approach to address these challenges by introducing the use of the latent diffusion model, a well-known high-quality image-generation model, into the realm of document binarization for the first time. By leveraging an iterative diffusion-denoising process within the latent space, our approach excels at producing high-quality, clean, binarized images and demonstrates excellent generalization using both data distribution and time steps during training. Furthermore, we enhance our model’s ability to preserve text strokes by incorporating a gated U-Net into the backbone network. The gated convolution mechanism allows the model to focus on the text region by combining gating values and features, facilitating the extraction of intricate text strokes. To maximize the effectiveness of our proposed model, we use a combination of the latent diffusion model loss and pixel-level loss, which aligns with the model’s structure. The experimental results on the Handwritten Document Image Binarization Contest and Document Image Binarization Contest benchmark datasets showcase the superior performance of our proposed model compared to existing methods. Full article
(This article belongs to the Special Issue AI-Based Image Processing: 2nd Edition)
Show Figures

Figure 1

21 pages, 5457 KiB  
Article
Research on Students’ Action Behavior Recognition Method Based on Classroom Time-Series Images
Appl. Sci. 2023, 13(18), 10426; https://doi.org/10.3390/app131810426 - 18 Sep 2023
Viewed by 554
Abstract
Students’ action behavior performance is an important part of classroom teaching evaluation. To detect the action behavior of students in classroom teaching videos, and based on the detection results, the action behavior sequence of individual students in the teaching time of knowledge points [...] Read more.
Students’ action behavior performance is an important part of classroom teaching evaluation. To detect the action behavior of students in classroom teaching videos, and based on the detection results, the action behavior sequence of individual students in the teaching time of knowledge points is obtained and analyzed. This paper proposes a method for recognizing students’ action behaviors based on classroom time-series images. First, we propose an improved Asynchronous Interaction Aggregation (AIA) network for student action behavior detection. By adding a Multi-scale Temporal Attention (MsTA) module and a Multi-scale Channel Spatial Attention (MsCSA) module to the fast pathway and slow pathway, respectively, the accuracy of student action behavior recognition is improved in SlowFast, which is the backbone network of the improved AIA network,. Second, the Equalized Focal Loss function is introduced to improve the category imbalance that exists in the student action behavior dataset. Experimental results on the student action behavior dataset show that the method proposed in this paper can detect different action behaviors of students in the classroom and has better detection performance compared to the original AIA network. Finally, based on the results of action behavior recognition, the seat number is used as the index to obtain the action behavior sequence of individual students during the teaching time of knowledge points and the performance of students in this period is analyzed. Full article
(This article belongs to the Special Issue AI-Based Image Processing: 2nd Edition)
Show Figures

Figure 1

18 pages, 5975 KiB  
Article
Intelligent Simulation Technology Based on RCS Imaging
Appl. Sci. 2023, 13(18), 10119; https://doi.org/10.3390/app131810119 - 08 Sep 2023
Viewed by 360
Abstract
The target simulation of airplanes is an important research topic. It is particularly important to find the right balance between high performance and low cost. In order to balance the contradictions between realistic target simulations and controllable costs, the scientific formulation of the [...] Read more.
The target simulation of airplanes is an important research topic. It is particularly important to find the right balance between high performance and low cost. In order to balance the contradictions between realistic target simulations and controllable costs, the scientific formulation of the performance parameters of target simulation is the key to achieving high performance. This paper proposes an intelligent simulation technology based on RCS imaging simulation through the combination of 60° variation corner reflector and a Luneberg lens reflector. It is designed to simulate several important RCS characteristics of the aircraft. At the same time, the different RCS images are automatically shifted to the corresponding gear position to achieve the purpose of simulation, and the price is low and the performance is good. It can be used for the training of radar target searching. Full article
(This article belongs to the Special Issue AI-Based Image Processing: 2nd Edition)
Show Figures

Figure 1

22 pages, 4219 KiB  
Article
MixerNet-SAGA A Novel Deep Learning Architecture for Superior Road Extraction in High-Resolution Remote Sensing Imagery
Appl. Sci. 2023, 13(18), 10067; https://doi.org/10.3390/app131810067 - 06 Sep 2023
Viewed by 541
Abstract
In this study, we address the limitations of current deep learning models in road extraction tasks from remote sensing imagery. We introduce MixerNet-SAGA, a novel deep learning model that incorporates the strengths of U-Net, integrates a ConvMixer block for enhanced feature extraction, and [...] Read more.
In this study, we address the limitations of current deep learning models in road extraction tasks from remote sensing imagery. We introduce MixerNet-SAGA, a novel deep learning model that incorporates the strengths of U-Net, integrates a ConvMixer block for enhanced feature extraction, and includes a Scaled Attention Gate (SAG) for augmented spatial attention. Experimental validation on the Massachusetts road dataset and the DeepGlobe road dataset demonstrates that MixerNet-SAGA achieves a 10% improvement in precision, 8% in recall, and 12% in IoU compared to leading models such as U-Net, ResNet, and SDUNet. Furthermore, our model excels in computational efficiency, being 20% faster, and has a smaller model size. Notably, MixerNet-SAGA shows exceptional robustness against challenges such as same-spectrum–different-object and different-spectrum–same-object phenomena. Ablation studies further reveal the critical roles of the ConvMixer block and SAG. Despite its strengths, the model’s scalability to extremely large datasets remains an area for future investigation. Collectively, MixerNet-SAGA offers an efficient and accurate solution for road extraction in remote sensing imagery and presents significant potential for broader applications. Full article
(This article belongs to the Special Issue AI-Based Image Processing: 2nd Edition)
Show Figures

Figure 1

17 pages, 5445 KiB  
Article
RepNet: A Lightweight Human Pose Regression Network Based on Re-Parameterization
Appl. Sci. 2023, 13(16), 9475; https://doi.org/10.3390/app13169475 - 21 Aug 2023
Viewed by 544
Abstract
Human pose estimation, as the basis of advanced computer vision, has a wide application perspective. In existing studies, the high-capacity model based on the heatmap method can achieve accurate recognition results, but it encounters many difficulties when used in real-world scenarios. To solve [...] Read more.
Human pose estimation, as the basis of advanced computer vision, has a wide application perspective. In existing studies, the high-capacity model based on the heatmap method can achieve accurate recognition results, but it encounters many difficulties when used in real-world scenarios. To solve this problem, we propose a lightweight pose regression algorithm (RepNet) that introduces a multi-parameter network structure, fuses multi-level features, and combines the idea of residual likelihood estimation. A well-designed convolutional architecture is used for training. By reconstructing the parameters of each level, the network model is simplified, and the computation time and efficiency of the detection task are optimized. The prediction performance is also improved by the output of the maximum likelihood model and the reversible transformation of the underlying distribution learned by the flow generation model. RepNet achieves a recognition accuracy of 66.1 AP on the COCO dataset, at a computational speed of 15 ms on GPU and 40 ms on CPU. This resolves the contradiction between prediction accuracy and computational complexity and contributes to research in lightweight pose estimation. Full article
(This article belongs to the Special Issue AI-Based Image Processing: 2nd Edition)
Show Figures

Figure 1

19 pages, 35842 KiB  
Article
Real-Time Intelligent Detection System for Illegal Wearing of On-Site Power Construction Worker Based on Edge-YOLO and Low-Cost Edge Devices
Appl. Sci. 2023, 13(14), 8287; https://doi.org/10.3390/app13148287 - 18 Jul 2023
Viewed by 598
Abstract
Ensuring personal safety and preventing accidents are critical aspects of power construction safety supervision. However, current monitoring methods are inefficient and unreliable as most of them rely on manual monitoring and transmission, which results in slow detection and delayed warnings regarding violations. To [...] Read more.
Ensuring personal safety and preventing accidents are critical aspects of power construction safety supervision. However, current monitoring methods are inefficient and unreliable as most of them rely on manual monitoring and transmission, which results in slow detection and delayed warnings regarding violations. To overcome these challenges, we propose an intelligent detection system that can accurately identify instances of illegal wearing of power construction workers in real-time. Firstly, we integrated the squeeze-and-excitation (SE) module into our convolutional neural network to enhance detection accuracy. This module effectively prioritizes informative features while suppressing less relevant ones, resulting in improved overall performance. Secondly, we present an embedded real-time detection system that utilizes Jetson Xavier NX and Edge-YOLO. This system promptly detects and alerts power construction workers of instances of illegal wearing behavior. To ensure a lightweight implementation, we design appropriate detection heads based on target size and distribution, reducing model parameters while enhancing detection speed and minimizing accuracy loss. Additionally, we employed data augmentation to enhance the system’s robustness. Our experimental results demonstrate that our improved Edge-YOLO model achieves high detection precision and recall rates of 0.964 and 0.966, respectively, with a frame rate of 35.36 frames per second when deployed on Jetson Xavier NX. Therefore, Edge-YOLO proves to be an ideal choice for intelligent real-time detection systems, providing superior accuracy and speed performance compared to the original YOLOv5s model and other models in the YOLO series for safety monitoring at construction sites. Full article
(This article belongs to the Special Issue AI-Based Image Processing: 2nd Edition)
Show Figures

Figure 1

Back to TopTop