Computer Vision, Image Processing Technologies and Artificial Intelligence, 2nd Edition

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "E1: Mathematics and Computer Science".

Deadline for manuscript submissions: 20 July 2025 | Viewed by 9306

Special Issue Editors


E-Mail Website
Guest Editor
Institute of Computing Technology, University of Chinese Academy of Sciences, Beijing 100049, China
Interests: video coding; computer vision; deep learning
Special Issues, Collections and Topics in MDPI journals
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
Interests: image processing; signal processing; artificial intelligence
Special Issues, Collections and Topics in MDPI journals
School of Computer Science, Beijing Information Science and Technology University, Beijing 100101, China
Interests: neural networks; machine learning; computer vision and developmental robotics
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
Interests: artificial intelligence; information security

Special Issue Information

Dear Colleagues,

Computer vision has been expanded into various researching fields, where information is extracted from vision data including image and video. The application of computer vision technology is prevailing in modern human life, with billions of people utilizing applications with the relevant technologies, including image recognition, image processing, object detection, etc., showing the necessity and potential of research in computer vision and its applications. The development of artificial intelligence has now equipped these techniques with the ability to outperform human beings. However, there are still many valuable problems in the research and application of computer vision and image processing technology and artificial intelligence.

This Special Issue on “Computer Vision, Image Processing Technologies and Artificial Intelligence” is aimed at gathering a collection of original articles contributing to the progress of theoretical and practical research in the domains of computer vision, image processing, and artificial intelligence, including but not limited to the following aspects and tasks:

  • Image augmentation;
  • Image restoration;
  • Image encoding;
  • Image segmentation;
  • Image recognition;
  • Image classification;
  • Image and video retrieval;
  • Image and video synthesis;
  • Object detection;
  • Image depiction;
  • Image-to-image translation;
  • Image forensics;
  • Artificial intelligence applied in information security;
  • Large-scale model on computer vision.

Prof. Dr. Hongang Qi
Dr. Yan Liu
Dr. Jun Miao
Prof. Dr. Lijuan Duan
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • artificial intelligence
  • deep learning
  • machine learning
  • neural networks
  • image processing
  • vision information
  • large-scale model on computer vision

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 958 KiB  
Article
DGYOLOv8: An Enhanced Model for Steel Surface Defect Detection Based on YOLOv8
by Guanlin Zhu, Honggang Qi and Ke Lv
Mathematics 2025, 13(5), 831; https://doi.org/10.3390/math13050831 - 2 Mar 2025
Viewed by 879
Abstract
The application of deep learning-based defect detection models significantly reduces the workload of workers and enhances the efficiency of inspections. In this paper, an enhanced YOLOv8 model (DCNv4_C2f + GAM + InnerMPDIoU + YOLOv8, hereafter referred to as DGYOLOv8) is developed to tackle [...] Read more.
The application of deep learning-based defect detection models significantly reduces the workload of workers and enhances the efficiency of inspections. In this paper, an enhanced YOLOv8 model (DCNv4_C2f + GAM + InnerMPDIoU + YOLOv8, hereafter referred to as DGYOLOv8) is developed to tackle the challenges of object detection in steel surface defect detection tasks. DGYOLOv8 incorporates a deformable convolution C2f (DCNv4_C2f) module into the backbone network to allow adaptive adjustment of the receptive field. Additionally, it integrates a Gate Attention Module (GAM) within the spatial and channel attention mechanisms, enhancing feature selection through a gating mechanism that strengthens key features, thereby improving the model’s generalization and interpretability. The InnerMPDIoU, which incorporates the latest Inner concepts, enhances detection accuracy and the ability to handle detailed aspects effectively. This model helps to address the limitations of current networks. Experimental results show improvements in precision (P), recall (R), and mean average precision (mAP) compared to existing models. Full article
Show Figures

Figure 1

30 pages, 26891 KiB  
Article
Multiexposed Image-Fusion Strategy Using Mutual Image Translation Learning with Multiscale Surround Switching Maps
by Young-Ho Go, Seung-Hwan Lee and Sung-Hak Lee
Mathematics 2024, 12(20), 3244; https://doi.org/10.3390/math12203244 - 16 Oct 2024
Cited by 1 | Viewed by 1263
Abstract
The dynamic range of an image represents the difference between its darkest and brightest areas, a crucial concept in digital image processing and computer vision. Despite display technology advancements, replicating the broad dynamic range of the human visual system remains challenging, necessitating high [...] Read more.
The dynamic range of an image represents the difference between its darkest and brightest areas, a crucial concept in digital image processing and computer vision. Despite display technology advancements, replicating the broad dynamic range of the human visual system remains challenging, necessitating high dynamic range (HDR) synthesis, combining multiple low dynamic range images captured at contrasting exposure levels to generate a single HDR image that integrates the optimal exposure regions. Recent deep learning advancements have introduced innovative approaches to HDR generation, with the cycle-consistent generative adversarial network (CycleGAN) gaining attention due to its robustness against domain shifts and ability to preserve content style while enhancing image quality. However, traditional CycleGAN methods often rely on unpaired datasets, limiting their capacity for detail preservation. This study proposes an improved model by incorporating a switching map (SMap) as an additional channel in the CycleGAN generator using paired datasets. The SMap focuses on essential regions, guiding weighted learning to minimize the loss of detail during synthesis. Using translated images to estimate the middle exposure integrates these images into HDR synthesis, reducing unnatural transitions and halo artifacts that could occur at boundaries between various exposures. The multilayered application of the retinex algorithm captures exposure variations, achieving natural and detailed tone mapping. The proposed mutual image translation module extends CycleGAN, demonstrating superior performance in multiexposure fusion and image translation, significantly enhancing HDR image quality. The image quality evaluation indices used are CPBDM, JNBM, LPC-SI, S3, JPEG_2000, and SSEQ, and the proposed model exhibits superior performance compared to existing methods, recording average scores of 0.6196, 15.4142, 0.9642, 0.2838, 80.239, and 25.054, respectively. Therefore, based on qualitative and quantitative results, this study demonstrates the superiority of the proposed model. Full article
Show Figures

Figure 1

23 pages, 1980 KiB  
Article
GaitSTAR: Spatial–Temporal Attention-Based Feature-Reweighting Architecture for Human Gait Recognition
by Muhammad Bilal, He Jianbiao, Husnain Mushtaq, Muhammad Asim, Gauhar Ali and Mohammed ElAffendi
Mathematics 2024, 12(16), 2458; https://doi.org/10.3390/math12162458 - 8 Aug 2024
Cited by 2 | Viewed by 1290
Abstract
Human gait recognition (HGR) leverages unique gait patterns to identify individuals, but the effectiveness of this technique can be hindered due to various factors such as carrying conditions, foot shadows, clothing variations, and changes in viewing angles. Traditional silhouette-based systems often neglect the [...] Read more.
Human gait recognition (HGR) leverages unique gait patterns to identify individuals, but the effectiveness of this technique can be hindered due to various factors such as carrying conditions, foot shadows, clothing variations, and changes in viewing angles. Traditional silhouette-based systems often neglect the critical role of instantaneous gait motion, which is essential for distinguishing individuals with similar features. We introduce the ”Enhanced Gait Feature Extraction Framework (GaitSTAR)”, a novel method that incorporates dynamic feature weighting through the discriminant analysis of temporal and spatial features within a channel-wise architecture. Key innovations in GaitSTAR include dynamic stride flow representation (DSFR) to address silhouette distortion, a transformer-based feature set transformation (FST) for integrating image-level features into set-level features, and dynamic feature reweighting (DFR) for capturing long-range interactions. DFR enhances contextual understanding and improves detection accuracy by computing attention distributions across channel dimensions. Empirical evaluations show that GaitSTAR achieves impressive accuracies of 98.5%, 98.0%, and 92.7% under NM, BG, and CL conditions, respectively, with the CASIA-B dataset; 67.3% with the CASIA-C dataset; and 54.21% with the Gait3D dataset. Despite its complexity, GaitSTAR demonstrates a favorable balance between accuracy and computational efficiency, making it a powerful tool for biometric identification based on gait patterns. Full article
Show Figures

Figure 1

20 pages, 15016 KiB  
Article
Masked Feature Compression for Object Detection
by Chengjie Dai, Tiantian Song, Yuxuan Jin, Yixiang Ren, Bowei Yang and Guanghua Song
Mathematics 2024, 12(12), 1848; https://doi.org/10.3390/math12121848 - 14 Jun 2024
Viewed by 1375
Abstract
Deploying high-accuracy detection models on lightweight edge devices (e.g., drones) is challenging due to hardware constraints. To achieve satisfactory detection results, a common solution is to compress and transmit the images to a cloud server where powerful models can be used. However, the [...] Read more.
Deploying high-accuracy detection models on lightweight edge devices (e.g., drones) is challenging due to hardware constraints. To achieve satisfactory detection results, a common solution is to compress and transmit the images to a cloud server where powerful models can be used. However, the image compression process for transmission may lead to a reduction in detection accuracy. In this paper, we propose a feature compression method tailored for object detection tasks, and it can be easily integrated with existing learned image compression models. In the method, the encoding process consists of two steps. Firstly, we use a feature extractor to obtain the low-level feature, and then use a mask generator to obtain an object mask to select regions containing objects. Secondly, we use a neural network encoder to compress the masked feature. As for decoding, a neural network decoder is used to restore the compressed representation into the feature that can be directly inputted into the object detection model. The experimental results demonstrate that our method surpasses existing compression techniques. Specifically, when compared to one of the leading methods—TCM2023—our approach achieves a 25.3% reduction in compressed file size and a 6.9% increase in mAP0.5. Full article
Show Figures

Figure 1

23 pages, 8070 KiB  
Article
Enhancing Emergency Vehicle Detection: A Deep Learning Approach with Multimodal Fusion
by Muhammad Zohaib, Muhammad Asim and Mohammed ELAffendi
Mathematics 2024, 12(10), 1514; https://doi.org/10.3390/math12101514 - 13 May 2024
Cited by 11 | Viewed by 3434
Abstract
Emergency vehicle detection plays a critical role in ensuring timely responses and reducing accidents in modern urban environments. However, traditional methods that rely solely on visual cues face challenges, particularly in adverse conditions. The objective of this research is to enhance emergency vehicle [...] Read more.
Emergency vehicle detection plays a critical role in ensuring timely responses and reducing accidents in modern urban environments. However, traditional methods that rely solely on visual cues face challenges, particularly in adverse conditions. The objective of this research is to enhance emergency vehicle detection by leveraging the synergies between acoustic and visual information. By incorporating advanced deep learning techniques for both acoustic and visual data, our aim is to significantly improve the accuracy and response times. To achieve this goal, we developed an attention-based temporal spectrum network (ATSN) with an attention mechanism specifically designed for ambulance siren sound detection. In parallel, we enhanced visual detection tasks by implementing a Multi-Level Spatial Fusion YOLO (MLSF-YOLO) architecture. To combine the acoustic and visual information effectively, we employed a stacking ensemble learning technique, creating a robust framework for emergency vehicle detection. This approach capitalizes on the strengths of both modalities, allowing for a comprehensive analysis that surpasses existing methods. Through our research, we achieved remarkable results, including a misdetection rate of only 3.81% and an accuracy of 96.19% when applied to visual data containing emergency vehicles. These findings represent significant progress in real-world applications, demonstrating the effectiveness of our approach in improving emergency vehicle detection systems. Full article
Show Figures

Figure 1

Back to TopTop