applsci-logo

Journal Browser

Journal Browser

Applications in Computer Vision and Image Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 30 October 2025 | Viewed by 6098

Special Issue Editors

Image Processing Center, Beijing University of Aeronautics and Astronautics, Beijing 100191, China
Interests: image guidance; surgical robotics; computer vision
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Science, The University of Alabama, Tuscaloosa, AL, USA
Interests: autonomy; intelligent robotics; machine learning

Special Issue Information

Dear Colleagues,

In recent years, the fields of computer vision and image processing have undergone transformative developments, fueled by advancements in machine learning, deep learning architectures, and computational power. This Special Issue, titled "Applications in Computer Vision and Image Processing", aims to explore the dynamic landscape of these fields, showcasing breakthrough research and innovative applications. We invite contributions that highlight novel methodologies, algorithms, and real-world applications that push the boundaries of what is achievable in analyzing, understanding, and visualizing image data. Papers focusing on areas such as object detection and recognition, image enhancement, 3D reconstruction, medical image analysis, and augmented reality are particularly welcome. This Special Issue endeavors to serve as a repository of knowledge that bridges the gap between theoretical research and practical implementations, making significant impacts across various industries, including medical surgery, automotive, robotics, and more.

We wish this edition to be an exciting collaboration of ideas from both academia and industry, fostering innovation and highlighting the latest advancements in the field. Whether you are a researcher, practitioner, or enthusiast, this Special Issue will provide a comprehensive insight into the ever-evolving world of computer vision and image processing. We look forward to your valuable contributions and are excited to share this knowledge with our readers worldwide. Thank you for being a part of this journey towards advancing technological frontiers.

Dr. Cai Meng
Dr. Hongsheng He
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data and computer vision
  • deep learning in computer vision
  • computer vision in robotics
  • computer vision in industry
  • computer vision in auto driving
  • visual SLAM
  • computer vision in remote sensing
  • target detection, recognition, and tracking
  • text detection and recognition
  • 3D reconstruction
  • human–computer interaction
  • pattern recognition and analysis
  • image segmentation
  • image restoration or image impaiting
  • other topics related to applications of computer vision and image processing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 8769 KiB  
Article
Mamba-DQN: Adaptively Tunes Visual SLAM Parameters Based on Historical Observation DQN
by Xubo Ma, Chuhua Huang, Xin Huang and Wangping Wu
Appl. Sci. 2025, 15(6), 2950; https://doi.org/10.3390/app15062950 - 9 Mar 2025
Viewed by 629
Abstract
The parameter configuration of traditional visual SLAM algorithms usually relies on expert experience and extensive experiments, and the parameter configuration needs to be reset as the scene changes, which is a complex and tedious process. To achieve parameter adaptation in visual SLAM, we [...] Read more.
The parameter configuration of traditional visual SLAM algorithms usually relies on expert experience and extensive experiments, and the parameter configuration needs to be reset as the scene changes, which is a complex and tedious process. To achieve parameter adaptation in visual SLAM, we propose the Mamba-DQN method, which transforms complex parameter adjustment tasks into policy learning assignments for the agent. In this paper, we select the key parameters of visual SLAM to construct the agent action space. The reward function is constructed based on the absolute trajectory error (ATE), and the Mamba history observer is built within the agent to learn the observation trajectory, aiming to improve the quality of the agent’s decisions. Finally, the proposed method was experimented on the EuRoc MAV and TUM-VI datasets. The experimental results show that Mamba-DQN not only enhances the positioning accuracy of visual SLAM and demonstrates good real-time performance but also avoids the tedious parameter adjustment process. Full article
(This article belongs to the Special Issue Applications in Computer Vision and Image Processing)
Show Figures

Figure 1

18 pages, 36707 KiB  
Article
High-Precision Image Editing via Dual Attention Control in Diffusion Models Without Fine-Tuning
by Zhiqiang Pan, Yingchun Kuang, Jianmei Lan and Lizhuo Zhang
Appl. Sci. 2025, 15(3), 1079; https://doi.org/10.3390/app15031079 - 22 Jan 2025
Viewed by 1626
Abstract
Existing diffusion models outperform generative models like Generative Adversarial Networks in image synthesis and editing. However, they struggle with high-precision image editing while preserving image details and the accuracy of editing instructions. To address these challenges, we propose a dual attention control method [...] Read more.
Existing diffusion models outperform generative models like Generative Adversarial Networks in image synthesis and editing. However, they struggle with high-precision image editing while preserving image details and the accuracy of editing instructions. To address these challenges, we propose a dual attention control method to achieve high-precision image editing. Our approach includes two key attention control modules: (1) cross-attention control module, which combines the cross-attention maps of the original and edited images through weighted parameters, ensures that the synthesized edited image retains the structure of the input image. (2) Self-attention control module, which varies based on the editing task, applied at “coarse” and “fine” layers, since the coarse layers help maintain input image details and the fine layers are better suited for style transformations. Experimental evaluations have demonstrated that our approach achieves excellent results in detail preservation, content consistency, visual realism, and semantic understanding, making it especially suitable for tasks requiring high-precision editing. Specifically, compared to the editing outcomes under no control conditions, the introduction of dual visual attention control has led to an increase of 6.19% in CLIP scores, a reduction of 29.3% in LPIPS, and a decrease of 24.7% in FID. These significant improvements not only validate the effectiveness of the dual attention control but also attest to the method’s substantial flexibility and adaptability across different scenarios. Notably, our approach is a zero-shot solution, requiring no user optimization or fine-tuning, facilitating real-world applications. Full article
(This article belongs to the Special Issue Applications in Computer Vision and Image Processing)
Show Figures

Figure 1

17 pages, 6879 KiB  
Article
Machine Learning Models for Artist Classification of Cultural Heritage Sketches
by Gianina Chirosca, Roxana Rădvan, Silviu Mușat, Matei Pop and Alecsandru Chirosca
Appl. Sci. 2025, 15(1), 212; https://doi.org/10.3390/app15010212 - 30 Dec 2024
Viewed by 937
Abstract
Modern computer vision algorithms allow researchers and art historians to search for artist-characteristic contour extraction from sketches, thus providing accurate input for artwork analysis, for possible assignments and classifications, and also for the identification of the specific stylistic features. We approach this challenging [...] Read more.
Modern computer vision algorithms allow researchers and art historians to search for artist-characteristic contour extraction from sketches, thus providing accurate input for artwork analysis, for possible assignments and classifications, and also for the identification of the specific stylistic features. We approach this challenging task with three machine learning algorithms and evaluate their performance on a small collection of images from five distinct artists. These algorithms aim to find the most appropriate artist for a sketch (or a contour of a sketch), with promising results that have a higher level of confidence (around 92%). Models start from common Faster R-CNN architectures, reinforcement learning, and vector extraction tools. The proposed tool provides a base for future improvements to create a tool that aids artwork evaluators. Full article
(This article belongs to the Special Issue Applications in Computer Vision and Image Processing)
Show Figures

Figure 1

23 pages, 106560 KiB  
Article
RLUNet: Overexposure-Content-Recovery-Based Single HDR Image Reconstruction with the Imaging Pipeline Principle
by Yiru Zheng, Wei Wang, Xiao Wang and Xin Yuan
Appl. Sci. 2024, 14(23), 11289; https://doi.org/10.3390/app142311289 - 3 Dec 2024
Viewed by 1300
Abstract
With the popularity of High Dynamic Range (HDR) display technology, consumer demand for HDR images is increasing. Since HDR cameras are expensive, reconstructing High Dynamic Range (HDR) images from traditional Low Dynamic Range (LDR) images is crucial. However, existing HDR image reconstruction algorithms [...] Read more.
With the popularity of High Dynamic Range (HDR) display technology, consumer demand for HDR images is increasing. Since HDR cameras are expensive, reconstructing High Dynamic Range (HDR) images from traditional Low Dynamic Range (LDR) images is crucial. However, existing HDR image reconstruction algorithms often fail to recover fine details and do not adequately address the fundamental principles of the LDR imaging pipeline. To overcome these limitations, the Reversing Lossy UNet (RLUNet) has been proposed, aiming to effectively balance dynamic range expansion and recover overexposed areas through a deeper understanding of LDR image pipeline principles. The RLUNet model comprises the Reverse Lossy Network, which is designed according to the LDR–HDR framework and focuses on reconstructing HDR images by recovering overexposed regions, dequantizing, linearizing the mapping, and suppressing compression artifacts. This framework, grounded in the principles of the LDR imaging pipeline, is designed to reverse the operations involved in lossy image operations. Furthermore, the integration of the Texture Filling Module (TFM) block with the Recovery of Overexposed Regions (ROR) module in the RLUNet model enhances the visual performance and detail texture of the overexposed areas in the reconstructed HDR image. The experiments demonstrate that the proposed RLUNet model outperforms various state-of-the-art methods on different testsets. Full article
(This article belongs to the Special Issue Applications in Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 1669 KiB  
Article
FETrack: Feature-Enhanced Transformer Network for Visual Object Tracking
by Hang Liu, Detian Huang and Mingxin Lin
Appl. Sci. 2024, 14(22), 10589; https://doi.org/10.3390/app142210589 - 17 Nov 2024
Viewed by 1050
Abstract
Visual object tracking is a fundamental task in computer vision, with applications ranging from video surveillance to autonomous driving. Despite recent advances in transformer-based one-stream trackers, unrestricted feature interactions between the template and the search region often introduce background noise into the template, [...] Read more.
Visual object tracking is a fundamental task in computer vision, with applications ranging from video surveillance to autonomous driving. Despite recent advances in transformer-based one-stream trackers, unrestricted feature interactions between the template and the search region often introduce background noise into the template, degrading the tracking performance. To address this issue, we propose FETrack, a feature-enhanced transformer-based network for visual object tracking. Specifically, we incorporate an independent template stream in the encoder of the one-stream tracker to acquire the high-quality template features while suppressing the harmful background noise effectively. Then, we employ a sequence-learning-based causal transformer in the decoder to generate the bounding box autoregressively, simplifying the prediction head network. Further, we present a dynamic threshold-based online template-updating strategy and a template-filtering approach to boost tracking robustness and reduce redundant computations. Extensive experiments demonstrate that our FETrack achieves a superior performance over state-of-the-art trackers. Specifically, the proposed FETrack achieves a 75.1% AO on GOT-10k, 81.2% AUC on LaSOT, and 89.3% Pnorm on TrackingNet. Full article
(This article belongs to the Special Issue Applications in Computer Vision and Image Processing)
Show Figures

Figure 1

Back to TopTop