New Advances in Image Processing and Computer Vision

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "E1: Mathematics and Computer Science".

Deadline for manuscript submissions: 31 July 2026 | Viewed by 16902

Special Issue Editors


E-Mail Website
Guest Editor
Facultad de Ingeniería, Universidad Panamericana, Augusto Rodin 498, Ciudad de México 03920, Mexico
Interests: image processing; non-contact vital sign monitoring; artificial intelligence; machine learning; biomedical imaging

E-Mail Website
Guest Editor
Facultad de Ingeniería, Universidad Panamericana, Augusto Rodin 498, Ciudad de México 03920, Mexico
Interests: image and signal processing; computer vision; machine learning; robotics navigation; biomedical engineering; cryptography; watermarking; motion estimation; motion magnification; fuzzy logic; contactless vital signs estimation

Special Issue Information

Dear Colleagues,

New mathematical models and methods used in the fields of image processing and computer vision are a key piece to describe and resolve many real-world problems where digital images are involved as the main source of information. These models and methods, in conjunction with edge-computer power and machine learning approaches, can solve complex image challenges.

This Special Issue focuses on the latest advances in the fields of image processing and computer vision. In addition, it provides a multidisciplinary platform for researchers and developers to share original, innovative, and state-of-the-art image processing and analysis algorithms and methods.

Papers on the following topics are welcome: image classification and segmentation, image enhancement, image restoration, image encryption, watermarking, texture analysis, document image processing, biomedical decision support systems, vision 3D, visual servoing, and video processing. Papers on machine learning models are also encouraged.

Prof. Dr. Jorge Brieva
Prof. Dr. Ernesto Moya-Albor
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image analysis and understanding
  • image enhancement
  • image segmentation
  • motion estimation and analysis
  • biomedical image processing applications
  • steganography and image watermarking
  • 3D vision
  • pattern recognition
  • object detection
  • machine learning applications

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

24 pages, 12624 KB  
Article
Semantic-Preserving Multi-Object Coexistence: A Backdoor Attack on Text-to-Image Diffusion Models
by Zhoufan Yang, Honghui Ning, Yimeng Pan, Junguo Liao and Shaobo Zhang
Mathematics 2026, 14(11), 1874; https://doi.org/10.3390/math14111874 - 28 May 2026
Viewed by 231
Abstract
Text-to-image (T2I) diffusion models have become popular in computer vision, but they remain vulnerable to backdoor attacks. Existing methods typically trigger a fixed image regardless of user input, causing severe semantic inconsistency between the generated image and the original prompt. This makes the [...] Read more.
Text-to-image (T2I) diffusion models have become popular in computer vision, but they remain vulnerable to backdoor attacks. Existing methods typically trigger a fixed image regardless of user input, causing severe semantic inconsistency between the generated image and the original prompt. This makes the attack easily detectable by machines as it would lack visual stealth. To overcome this challenge, we propose MultiAttack, a novel semantic-preserving multi-object coexistence backdoor attack for T2I diffusion models, which retains prompt-described objects while injecting malicious targets. First, we propose a semantic-preserving data poisoning strategy to build a latent mapping, which maps the trigger into a composite semantic space while retaining the original prompt context. Second, we design a backdoor enhancement mechanism to embed the spatial orthogonality between malicious and benign objects into model weights as a conditional response, which strengthens the model’s ability to generate stable malicious outputs without requiring additional inference. Results on Stable Diffusion show that compared tostate-of-the-art baselines, MultiAttack increases attack success rate by 13.1% and visual stealth (defined as the success rate of co-generating both prompt-described and target objects) by 12.6%, with an FID increase of less than 1.2 and a CLIP score decrease of under 1 compared to clean models. Full article
(This article belongs to the Special Issue New Advances in Image Processing and Computer Vision)
Show Figures

Figure 1

18 pages, 860 KB  
Article
Knowledge Graph-Driven Reinforcement Learning for Zero-Shot Vision-Language Navigation
by Ye Zhang, Yandong Zhao, He Liu, Tengfei Shi, Weitao Jia and Shenghong Li
Mathematics 2026, 14(9), 1485; https://doi.org/10.3390/math14091485 - 28 Apr 2026
Viewed by 368
Abstract
To address the limitations of zero-shot generalization in Vision-Language Navigation (VLN), this paper proposes a novel knowledge graph-driven reinforcement learning approach. Our method constructs a hierarchical, dynamically updated knowledge graph online during the agent’s real-time interaction with the environment, seamlessly aligning external semantic [...] Read more.
To address the limitations of zero-shot generalization in Vision-Language Navigation (VLN), this paper proposes a novel knowledge graph-driven reinforcement learning approach. Our method constructs a hierarchical, dynamically updated knowledge graph online during the agent’s real-time interaction with the environment, seamlessly aligning external semantic priors with continuous visual perception. By leveraging a Chain-of-Thought (CoT) prompting mechanism, the agent performs multi-hop reasoning to precisely locate target objects. Furthermore, we design an end-to-end optimized reinforcement learning framework that fuses multi-modal features and employs a task-oriented composite reward function. Extensive experiments in the AI2-THOR simulation environment demonstrate that the proposed method significantly improves navigation success rates in zero-shot settings. The results validate its robust generalization capabilities, particularly for unseen object categories and complex scene layouts. Full article
(This article belongs to the Special Issue New Advances in Image Processing and Computer Vision)
Show Figures

Figure 1

25 pages, 3298 KB  
Article
FDE-YOLO: An Improved Algorithm for Small Target Detection in UAV Images
by Jialiang Li, Xu Guo, Xu Zhao and Jie Jin
Mathematics 2026, 14(4), 663; https://doi.org/10.3390/math14040663 - 13 Feb 2026
Viewed by 1064
Abstract
Accurate small object detection in unmanned aerial vehicle (UAV) imagery is fundamental to numerous safety-critical applications, including intelligent transportation, urban surveillance, and disaster assessment. However, extreme scale compression, dense object distributions, and complex backgrounds severely constrain the feature representation capability of existing detectors, [...] Read more.
Accurate small object detection in unmanned aerial vehicle (UAV) imagery is fundamental to numerous safety-critical applications, including intelligent transportation, urban surveillance, and disaster assessment. However, extreme scale compression, dense object distributions, and complex backgrounds severely constrain the feature representation capability of existing detectors, leading to degraded reliability in real-world deployments. To overcome these limitations, we propose FDE-YOLO, a lightweight yet high-performance detection framework built upon YOLOv11 with three complementary architectural innovations. The Fine-Grained Detection Pyramid (FGDP) integrates space-to-depth convolution with a CSP-MFE module that fuses multi-granularity features through parallel local, context, and global branches, capturing comprehensive small target information while avoiding computational overhead from layer stacking. The Dynamic Detection Fusion Head (DDFHead) unifies scale-aware, spatial-aware, and task-aware attention mechanisms via sequential refinement with DCNv4 and FReLU activation, adaptively enhancing discriminative capability for densely clustered targets in complex scenes. The EdgeSpaceNet module explicitly fuses Sobel-extracted boundary features with spatial convolution outputs through residual connections, recovering edge details typically lost in standard operations while reducing parameter count via depthwise separable convolutions. Extensive experiments on the VisDrone2019 dataset demonstrate that FDE-YOLO achieves 53.6% precision, 42.5% recall, 43.3% mAP50, and 26.3% mAP50:95, surpassing YOLOv11s by 2.8%, 4.4%, 4.1%, and 2.8% respectively, with only 10.25 M parameters. The proposed approach outperforms UAV-specialized methods including Drone-YOLO and MASF-YOLO while using significantly fewer parameters (37.5% and 29.8% reductions respectively), demonstrating superior efficiency. Cross-dataset evaluations on UAV-DT and NWPU VHR-10 further confirm strong generalization capability with 1.6% and 1.5% mAP50 improvements respectively, validating FDE-YOLO as an effective and efficient solution for reliable UAV-based small object detection in real-world scenarios. Full article
(This article belongs to the Special Issue New Advances in Image Processing and Computer Vision)
Show Figures

Figure 1

15 pages, 2120 KB  
Article
MSA-Net: A Multi-Scale Attention Network with Contrastive Learning for Robust Intervertebral Disc Labeling in MRI
by Mohammad D. Alahmadi, Abdulrahman Gharawi and Tariq Alsahfi
Mathematics 2025, 13(23), 3811; https://doi.org/10.3390/math13233811 - 27 Nov 2025
Viewed by 615
Abstract
Accurate labeling of intervertebral discs (IVDs) in MRI scans is crucial for diagnosing spinal-related diseases such as osteoporosis, vertebral fractures, and IVD herniation. However, automatic IVD labeling remains challenging. The main issues include visual similarity to surrounding bone, anatomical variation across individuals, and [...] Read more.
Accurate labeling of intervertebral discs (IVDs) in MRI scans is crucial for diagnosing spinal-related diseases such as osteoporosis, vertebral fractures, and IVD herniation. However, automatic IVD labeling remains challenging. The main issues include visual similarity to surrounding bone, anatomical variation across individuals, and inconsistencies between MRI scans. Traditional post-detection disc labeling methods often struggle when localization algorithms miss discs or generate false positives. To address these challenges, we propose MSA-Net, a novel multi-scale attention network designed for semantic IVD labeling, emphasizing the use of prior geometric data. MSA-Net efficiently extracts multi-scale features and models intricate spatial dependencies throughout the spinal structure. We also integrate contrastive learning to enforce feature consistency. This helps the network distinguish IVDs from surrounding tissues. Extensive experiments on multi-center spine datasets demonstrate that MSA-Net consistently outperforms previous methods across MRI T1w and T2w modalities. These improvements demonstrate MSA-Net’s ability to handle variability in disc geometry, tissue contrast, and missed detections that challenge prior methods. Full article
(This article belongs to the Special Issue New Advances in Image Processing and Computer Vision)
Show Figures

Figure 1

20 pages, 6290 KB  
Article
ReceiptQA: A Question-Answering Dataset for Receipt Understanding
by Mahmoud Abdalla, Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Bilel Yagoub, Mostafa Farouk Senussi, Abdelrahman Abdallah, Seung Hun Kang and Hyun Soo Kang
Mathematics 2025, 13(11), 1760; https://doi.org/10.3390/math13111760 - 26 May 2025
Cited by 5 | Viewed by 5170
Abstract
Understanding information extracted from receipts is a critical task for real-world applications such as financial tracking, auditing, and enterprise resource management. In this paper, we introduce ReceiptQA, a novel large-scale dataset designed for receipt understanding through question-answering (QA). ReceiptQA contains 171,000 question–answer [...] Read more.
Understanding information extracted from receipts is a critical task for real-world applications such as financial tracking, auditing, and enterprise resource management. In this paper, we introduce ReceiptQA, a novel large-scale dataset designed for receipt understanding through question-answering (QA). ReceiptQA contains 171,000 question–answer pairs derived from 3500 receipt images, constructed via two complementary methodologies: (1) LLM-Generated Dataset: 70,000 synthetically generated QA pairs, where each receipt is paired with 20 unique, context-specific questions. These questions are produced using a state-of-the-art large language model (LLM) and validated through human annotation to ensure accuracy, relevance, and diversity. (2) Human-Created Dataset: 101,000 manually crafted questions spanning answerable and unanswerable queries. This subset includes carefully designed templates of varying difficulty (easy/hard) to comprehensively evaluate QA systems across diverse receipt domains. To benchmark performance, we evaluate leading vision–language models (VLMs) and language models (LMs), including GPT-4o, Phi-3B, Phi-3.5B, LLaVA-7B, InternVL2 (4B/8B), LLaMA-3.2, and Gemini. We further fine-tune a LLaMA-3.2 11B model on ReceiptQA, achieving significant improvements over baseline models on validation and test sets. Our analysis uncovers critical strengths and limitations of existing models in handling receipt-based QA tasks, establishing a robust benchmark for future research. Full article
(This article belongs to the Special Issue New Advances in Image Processing and Computer Vision)
Show Figures

Figure 1

19 pages, 8853 KB  
Article
Automatic Neural Architecture Search Based on an Estimation of Distribution Algorithm for Binary Classification of Image Databases
by Erick Franco-Gaona, Maria Susana Avila-Garcia and Ivan Cruz-Aceves
Mathematics 2025, 13(4), 605; https://doi.org/10.3390/math13040605 - 12 Feb 2025
Cited by 2 | Viewed by 2537
Abstract
Convolutional neural networks (CNNs) are widely used for image classification; however, setting the appropriate hyperparameters before training is subjective and time consuming, and the search space is not properly explored. This paper presents a novel method for the automatic neural architecture search based [...] Read more.
Convolutional neural networks (CNNs) are widely used for image classification; however, setting the appropriate hyperparameters before training is subjective and time consuming, and the search space is not properly explored. This paper presents a novel method for the automatic neural architecture search based on an estimation of distribution algorithm (EDA) for binary classification problems. The hyperparameters were coded in binary form due to the nature of the metaheuristics used in the automatic search stage of CNN architectures which was performed using the Boltzmann Univariate Marginal Distribution algorithm (BUMDA) chosen by statistical comparison between four metaheuristics to explore the search space, whose computational complexity is O(229). Moreover, the proposed method is compared with multiple state-of-the-art methods on five databases, testing its efficiency in terms of accuracy and F1-score. In the experimental results, the proposed method achieved an F1-score of 97.2%, 98.73%, 97.23%, 98.36%, and 98.7% in its best evaluation, better results than the literature. Finally, the computational time of the proposed method for the test set was ≈0.6 s, 1 s, 0.7 s, 0.5 s, and 0.1 s, respectively. Full article
(This article belongs to the Special Issue New Advances in Image Processing and Computer Vision)
Show Figures

Figure 1

20 pages, 2900 KB  
Article
HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images
by Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Bilel Yagoub, Mostafa Farouk Senussi, Mahmoud Abdalla and Hyun-Soo Kang
Mathematics 2025, 13(2), 266; https://doi.org/10.3390/math13020266 - 15 Jan 2025
Cited by 7 | Viewed by 5562
Abstract
Table detection in document images is a challenging problem due to diverse layouts, irregular structures, and embedded graphical elements. In this study, we present HTTD (Hierarchical Transformer for Table Detection), a cutting-edge model that combines a Swin-L Transformer backbone with advanced Transformer-based mechanisms [...] Read more.
Table detection in document images is a challenging problem due to diverse layouts, irregular structures, and embedded graphical elements. In this study, we present HTTD (Hierarchical Transformer for Table Detection), a cutting-edge model that combines a Swin-L Transformer backbone with advanced Transformer-based mechanisms to achieve superior performance. HTTD addresses three key challenges: handling diverse document layouts, including historical and modern structures; improving computational efficiency and training convergence; and demonstrating adaptability to non-standard tasks like medical imaging and receipt key detection. Evaluated on benchmark datasets, HTTD achieves state-of-the-art results, with precision rates of 96.98% on ICDAR-2019 cTDaR, 96.43% on TNCR, and 93.14% on TabRecSet. These results validate its effectiveness and efficiency, paving the way for advanced document analysis and data digitization tasks. Full article
(This article belongs to the Special Issue New Advances in Image Processing and Computer Vision)
Show Figures

Figure 1

Back to TopTop