Submit to Special Issue Submit Abstract to Special Issue Review for Mathematics Propose a Special Issue

Journal Menu

Journal Browser

Computer Vision, Image Processing Technologies and Artificial Intelligence, 2nd Edition

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Related Special Issue
Published Papers

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "E1: Mathematics and Computer Science".

Deadline for manuscript submissions: 30 April 2026 | Viewed by 12702

Share This Special Issue

Special Issue Editors

Prof. Dr. Hongang Qi

E-Mail Website
Guest Editor

Institute of Computing Technology, University of Chinese Academy of Sciences, Beijing 100049, China
Interests: video coding; computer vision; deep learning
Special Issues, Collections and Topics in MDPI journals

Dr. Yan Liu

E-Mail Website
Guest Editor

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
Interests: image processing; signal processing; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

Dr. Jun Miao

E-Mail Website
Guest Editor

School of Computer Science, Beijing Information Science and Technology University, Beijing 100101, China
Interests: neural networks; machine learning; computer vision and developmental robotics
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Lijuan Duan

E-Mail Website
Guest Editor

Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
Interests: artificial intelligence; information security

Special Issue Information

Dear Colleagues,

Computer vision has been expanded into various researching fields, where information is extracted from vision data including image and video. The application of computer vision technology is prevailing in modern human life, with billions of people utilizing applications with the relevant technologies, including image recognition, image processing, object detection, etc., showing the necessity and potential of research in computer vision and its applications. The development of artificial intelligence has now equipped these techniques with the ability to outperform human beings. However, there are still many valuable problems in the research and application of computer vision and image processing technology and artificial intelligence.

This Special Issue on “Computer Vision, Image Processing Technologies and Artificial Intelligence” is aimed at gathering a collection of original articles contributing to the progress of theoretical and practical research in the domains of computer vision, image processing, and artificial intelligence, including but not limited to the following aspects and tasks:

Image augmentation;
Image restoration;
Image encoding;
Image segmentation;
Image recognition;
Image classification;
Image and video retrieval;
Image and video synthesis;
Object detection;
Image depiction;
Image-to-image translation;
Image forensics;
Artificial intelligence applied in information security;
Large-scale model on computer vision.

Prof. Dr. Hongang Qi
Dr. Yan Liu
Dr. Jun Miao
Prof. Dr. Lijuan Duan
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

computer vision
artificial intelligence
deep learning
machine learning
neural networks
image processing
vision information
large-scale model on computer vision

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Computer Vision, Image Processing Technologies and Artificial Intelligence in Mathematics (14 articles)

Published Papers (6 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

19 pages, 7650 KiB

Open AccessArticle

Lightweight Mamba Model for 3D Tumor Segmentation in Automated Breast Ultrasounds

by JongNam Kim, Jun Kim, Fayaz Ali Dharejo, Zeeshan Abbas and Seung Won Lee

Mathematics 2025, 13(16), 2553; https://doi.org/10.3390/math13162553 - 9 Aug 2025

Viewed by 241

Abstract

Background: Recently, the adoption of AI-based technologies has been accelerating in the field of medical image analysis. For the early diagnosis and treatment planning of breast cancer, Automated Breast Ultrasound (ABUS) has emerged as a safe and non-invasive imaging method, especially for women with dense breasts. However, the increasing computational cost due to the minute size and complexity of 3D ABUS data remains a major challenge. Methods: In this study, we propose a novel model based on the Mamba state–space model architecture for 3D tumor segmentation in ABUS images. The model uses Mamba blocks to effectively capture the volumetric spatial features of tumors, and integrates a deep spatial pyramid pooling (DASPP) module to extract multiscale contextual information from lesions of different sizes. Results: On the TDSC-2023 ABUS dataset, the proposed model achieved a Dice Similarity Coefficient (DSC) of 0.8062, and Intersection over Union (IoU) of 0.6831, using only 3.08 million parameters. Conclusions: These results show that the proposed model improves the performance of tumor segmentation in ABUS, offering both diagnostic precision and computational efficiency. The reduced computational space suggests a strong potential for real-world medical applications, where accurate early diagnosis can reduce costs and improve patient survival. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence, 2nd Edition)

► Show Figures

Figure 1

16 pages, 958 KiB

Open AccessArticle

DGYOLOv8: An Enhanced Model for Steel Surface Defect Detection Based on YOLOv8

by Guanlin Zhu, Honggang Qi and Ke Lv

Mathematics 2025, 13(5), 831; https://doi.org/10.3390/math13050831 - 2 Mar 2025

Cited by 1 | Viewed by 1883

Abstract

The application of deep learning-based defect detection models significantly reduces the workload of workers and enhances the efficiency of inspections. In this paper, an enhanced YOLOv8 model (DCNv4_C2f + GAM + InnerMPDIoU + YOLOv8, hereafter referred to as DGYOLOv8) is developed to tackle the challenges of object detection in steel surface defect detection tasks. DGYOLOv8 incorporates a deformable convolution C2f (DCNv4_C2f) module into the backbone network to allow adaptive adjustment of the receptive field. Additionally, it integrates a Gate Attention Module (GAM) within the spatial and channel attention mechanisms, enhancing feature selection through a gating mechanism that strengthens key features, thereby improving the model’s generalization and interpretability. The InnerMPDIoU, which incorporates the latest Inner concepts, enhances detection accuracy and the ability to handle detailed aspects effectively. This model helps to address the limitations of current networks. Experimental results show improvements in precision (P), recall (R), and mean average precision (mAP) compared to existing models. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence, 2nd Edition)

► Show Figures

Figure 1

30 pages, 26891 KiB

Open AccessFeature PaperArticle

Multiexposed Image-Fusion Strategy Using Mutual Image Translation Learning with Multiscale Surround Switching Maps

by Young-Ho Go, Seung-Hwan Lee and Sung-Hak Lee

Mathematics 2024, 12(20), 3244; https://doi.org/10.3390/math12203244 - 16 Oct 2024

Cited by 1 | Viewed by 1467

Abstract

The dynamic range of an image represents the difference between its darkest and brightest areas, a crucial concept in digital image processing and computer vision. Despite display technology advancements, replicating the broad dynamic range of the human visual system remains challenging, necessitating high dynamic range (HDR) synthesis, combining multiple low dynamic range images captured at contrasting exposure levels to generate a single HDR image that integrates the optimal exposure regions. Recent deep learning advancements have introduced innovative approaches to HDR generation, with the cycle-consistent generative adversarial network (CycleGAN) gaining attention due to its robustness against domain shifts and ability to preserve content style while enhancing image quality. However, traditional CycleGAN methods often rely on unpaired datasets, limiting their capacity for detail preservation. This study proposes an improved model by incorporating a switching map (SMap) as an additional channel in the CycleGAN generator using paired datasets. The SMap focuses on essential regions, guiding weighted learning to minimize the loss of detail during synthesis. Using translated images to estimate the middle exposure integrates these images into HDR synthesis, reducing unnatural transitions and halo artifacts that could occur at boundaries between various exposures. The multilayered application of the retinex algorithm captures exposure variations, achieving natural and detailed tone mapping. The proposed mutual image translation module extends CycleGAN, demonstrating superior performance in multiexposure fusion and image translation, significantly enhancing HDR image quality. The image quality evaluation indices used are CPBDM, JNBM, LPC-SI, S3, JPEG_2000, and SSEQ, and the proposed model exhibits superior performance compared to existing methods, recording average scores of 0.6196, 15.4142, 0.9642, 0.2838, 80.239, and 25.054, respectively. Therefore, based on qualitative and quantitative results, this study demonstrates the superiority of the proposed model. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence, 2nd Edition)

► Show Figures

Figure 1

23 pages, 1980 KiB

Open AccessArticle

GaitSTAR: Spatial–Temporal Attention-Based Feature-Reweighting Architecture for Human Gait Recognition

by Muhammad Bilal, He Jianbiao, Husnain Mushtaq, Muhammad Asim, Gauhar Ali and Mohammed ElAffendi

Mathematics 2024, 12(16), 2458; https://doi.org/10.3390/math12162458 - 8 Aug 2024

Cited by 3 | Viewed by 1612

Abstract

Human gait recognition (HGR) leverages unique gait patterns to identify individuals, but the effectiveness of this technique can be hindered due to various factors such as carrying conditions, foot shadows, clothing variations, and changes in viewing angles. Traditional silhouette-based systems often neglect the critical role of instantaneous gait motion, which is essential for distinguishing individuals with similar features. We introduce the ”Enhanced Gait Feature Extraction Framework (GaitSTAR)”, a novel method that incorporates dynamic feature weighting through the discriminant analysis of temporal and spatial features within a channel-wise architecture. Key innovations in GaitSTAR include dynamic stride flow representation (DSFR) to address silhouette distortion, a transformer-based feature set transformation (FST) for integrating image-level features into set-level features, and dynamic feature reweighting (DFR) for capturing long-range interactions. DFR enhances contextual understanding and improves detection accuracy by computing attention distributions across channel dimensions. Empirical evaluations show that GaitSTAR achieves impressive accuracies of 98.5%, 98.0%, and 92.7% under NM, BG, and CL conditions, respectively, with the CASIA-B dataset; 67.3% with the CASIA-C dataset; and 54.21% with the Gait3D dataset. Despite its complexity, GaitSTAR demonstrates a favorable balance between accuracy and computational efficiency, making it a powerful tool for biometric identification based on gait patterns. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence, 2nd Edition)

► Show Figures

Figure 1

20 pages, 15016 KiB

Open AccessArticle

Masked Feature Compression for Object Detection

by Chengjie Dai, Tiantian Song, Yuxuan Jin, Yixiang Ren, Bowei Yang and Guanghua Song

Mathematics 2024, 12(12), 1848; https://doi.org/10.3390/math12121848 - 14 Jun 2024

Viewed by 1714

Abstract

Deploying high-accuracy detection models on lightweight edge devices (e.g., drones) is challenging due to hardware constraints. To achieve satisfactory detection results, a common solution is to compress and transmit the images to a cloud server where powerful models can be used. However, the image compression process for transmission may lead to a reduction in detection accuracy. In this paper, we propose a feature compression method tailored for object detection tasks, and it can be easily integrated with existing learned image compression models. In the method, the encoding process consists of two steps. Firstly, we use a feature extractor to obtain the low-level feature, and then use a mask generator to obtain an object mask to select regions containing objects. Secondly, we use a neural network encoder to compress the masked feature. As for decoding, a neural network decoder is used to restore the compressed representation into the feature that can be directly inputted into the object detection model. The experimental results demonstrate that our method surpasses existing compression techniques. Specifically, when compared to one of the leading methods—TCM2023—our approach achieves a 25.3% reduction in compressed file size and a 6.9% increase in mAP0.5. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence, 2nd Edition)

► Show Figures

Figure 1

23 pages, 8070 KiB

Open AccessArticle

Enhancing Emergency Vehicle Detection: A Deep Learning Approach with Multimodal Fusion

by Muhammad Zohaib, Muhammad Asim and Mohammed ELAffendi

Mathematics 2024, 12(10), 1514; https://doi.org/10.3390/math12101514 - 13 May 2024

Cited by 14 | Viewed by 4548

Abstract

Emergency vehicle detection plays a critical role in ensuring timely responses and reducing accidents in modern urban environments. However, traditional methods that rely solely on visual cues face challenges, particularly in adverse conditions. The objective of this research is to enhance emergency vehicle detection by leveraging the synergies between acoustic and visual information. By incorporating advanced deep learning techniques for both acoustic and visual data, our aim is to significantly improve the accuracy and response times. To achieve this goal, we developed an attention-based temporal spectrum network (ATSN) with an attention mechanism specifically designed for ambulance siren sound detection. In parallel, we enhanced visual detection tasks by implementing a Multi-Level Spatial Fusion YOLO (MLSF-YOLO) architecture. To combine the acoustic and visual information effectively, we employed a stacking ensemble learning technique, creating a robust framework for emergency vehicle detection. This approach capitalizes on the strengths of both modalities, allowing for a comprehensive analysis that surpasses existing methods. Through our research, we achieved remarkable results, including a misdetection rate of only 3.81% and an accuracy of 96.19% when applied to visual data containing emergency vehicles. These findings represent significant progress in real-world applications, demonstrating the effectiveness of our approach in improving emergency vehicle detection systems. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence, 2nd Edition)

► Show Figures

Journal Menu

Journal Browser

Computer Vision, Image Processing Technologies and Artificial Intelligence, 2nd Edition

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (6 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI