applsci-logo

Journal Browser

Journal Browser

Recent Advances and New Trends in Computer Vision and Image Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 May 2026 | Viewed by 21409

Special Issue Editor


E-Mail Website
Guest Editor
CITAB—Centre for the Research and Technology of Agro-Environmental and Biological Sciences, UTAD—University of Trás-os-Montes e Alto Douro, Quinta de Prados, 5001-801 Vila Real, Portugal
Interests: computer vision; image processing; medical image processing; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Grounded on visual data, computer vision aims to enable computers to see, understand, decide, and act. Over the years, computational vision has rapidly gained popularity in a wide range of areas, including industry, transportation, agriculture, and medicine. Nowadays, artificial intelligence-powered vision systems are driving the sector to previously unseen levels of popularity by increasing their efficiency and accuracy.

Thus, the applications of such systems are expected to continue to increase alongside the artificial intelligence, machine learning, and deep learning algorithms that are being developed within these frameworks, which have recently achieved great success over conventional techniques. Hence, the community has high expectations for where they foresee these new artificial intelligence-powered techniques in the coming years, in terms of discipline and practice. Thus, this Special Issue aims to provie insight on this future.

This Special Issue aims at presenting new technical approaches in computer vision research and development with particular emphasis on the engineering and technological aspects of image processing and computer vision and their contributions to a wide range of application fields, including (but not limited to) the following:

  • Agriculture;
  • Healthcare;
  • Environmental monitoring;
  • Security and surveillance;
  • Automotive industry;
  • Entertainment;
  • Robotics.

Dr. Pedro Couto
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • image processing
  • video processing
  • artificial intelligence
  • and machine learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review, Other

47 pages, 9682 KB  
Article
Unsupervised Hierarchical Visual Taxonomy of Marble Natural Stone Using Cluster-Aware Self-Supervised Vision Transformers
by Margarida Figueiredo, Carlos M. A. Diogo, Gustavo Paneiro, Pedro Amaral and António Alves de Campos
Appl. Sci. 2026, 16(9), 4137; https://doi.org/10.3390/app16094137 - 23 Apr 2026
Viewed by 126
Abstract
The marble industry relies on proprietary commercial names rather than objective visual categories, creating market inefficiencies for stakeholders who select stones based on appearance. Supervised classification perpetuates this problem by replicating inconsistent commercial labels instead of discovering intrinsic visual structure. We propose an [...] Read more.
The marble industry relies on proprietary commercial names rather than objective visual categories, creating market inefficiencies for stakeholders who select stones based on appearance. Supervised classification perpetuates this problem by replicating inconsistent commercial labels instead of discovering intrinsic visual structure. We propose an unsupervised pipeline combining a two-stage training strategy: A pure self-supervised pretraining followed by cluster-aware fine-tuning of a DINO Vision Transformer, with empirically selected dimensionality reduction and agglomerative hierarchical clustering. Systematic ablation studies on 1480 marble images spanning 10 commercial varieties validate each design choice: cluster-aware training at k = 10 yields geometrically improved embeddings over the self-supervised baseline (mean Silhouette Score 0.693 ± 0.053 vs. 0.660 ± 0.030; mean Davies–Bouldin Index 0.386 ± 0.075 vs. 0.569 ± 0.012; N = 9 independent evaluations across 3 data partitions × 3 training initializations). The resulting taxonomy reveals three phenomena invisible to commercial classification: cross-category merging of visually indistinguishable stones carrying different market names, intra-category splitting of heterogeneous sub-populations within single varieties, and coherent grouping where commercial and visual boundaries coincide, with all three confirmed in every independent run. We further demonstrate that standard extrinsic metrics are misaligned with unsupervised taxonomy objectives when reference labels encode the inconsistencies the method aims to resolve. Validating this methodology across diverse stone types, larger datasets, and varied acquisition conditions represents a natural and necessary next step toward establishing its cross-domain generalizability. Full article
30 pages, 5315 KB  
Article
Dynamic Multi-Exposure HDR Reconstruction via Dual-Branch Base-Detail Collaboration
by Qin Zhou, Min Chen, Feifan Cai, Zihao Zhang and Youdong Ding
Appl. Sci. 2026, 16(9), 4119; https://doi.org/10.3390/app16094119 - 23 Apr 2026
Viewed by 157
Abstract
Dynamic multi-exposure high dynamic range (HDR) image reconstruction remains challenging because it must preserve globally consistent luminance and structure while recovering fine-grained local textures from low dynamic range (LDR) inputs corrupted by saturation, under-exposure, and motion-induced artifacts. Existing CNN-based methods are effective at [...] Read more.
Dynamic multi-exposure high dynamic range (HDR) image reconstruction remains challenging because it must preserve globally consistent luminance and structure while recovering fine-grained local textures from low dynamic range (LDR) inputs corrupted by saturation, under-exposure, and motion-induced artifacts. Existing CNN-based methods are effective at local detail restoration but remain limited in global context modeling, whereas Transformer-based methods improve long-range interaction but can still weaken local-detail refinement. Current hybrid designs suggest that the two representation types are complementary, but they do not fully address branch specialization, cross-branch collaboration, and local-feature reliability control. To address this gap, we propose a dual-branch Transformer-CNN framework with a base branch built on Window-based Residual Transformer Blocks (WRTBs), a detail branch equipped with Detail-Aware Gating (DAG) for reliability-aware local refinement, and Bidirectional Cross-Branch Fusion (BCBF) for stage-wise collaboration between the two branches. Experiments on Kalantari17, the Tel benchmark, and Challenge123 show that the proposed design remains competitive on the standard benchmark, achieving the best HDR-VDP2 and tied best with μ-SSIM on Kalantari17, while yielding clearer gains on the more challenging Tel and Challenge123 benchmarks. Full article
23 pages, 384 KB  
Article
Cues for a Grammar of Potentials in Markov Field Models of Computer Vision
by Luigi Burigana
Appl. Sci. 2026, 16(8), 4030; https://doi.org/10.3390/app16084030 - 21 Apr 2026
Viewed by 141
Abstract
Several well-known models in present-day computer vision take the form of Markov random fields. Any model of this kind amounts to a network of soft constraints, which are called potentials. These are the subject of this study. First, three kinds of information that [...] Read more.
Several well-known models in present-day computer vision take the form of Markov random fields. Any model of this kind amounts to a network of soft constraints, which are called potentials. These are the subject of this study. First, three kinds of information that are involved in any computer vision inference task are identified, namely, evidence, target, and principled information, and the concept of a variable as applied in this context is discussed. The general meaning of a potential is then described, which is a local soft constraint that aims to promote a corresponding desired condition. Following this, the formal structure of a potential is highlighted, which includes a set of parameters and an analytic frame, with this being a hierarchy of operations by which the value of the potential can be computed. The possible presence of a core in the analytic frame is considered, and two salient kinds of cores are distinguished and illustrated using examples from the literature: one involving a distance function and the other given by a probabilistic conditional. In summary, this contribution highlights substantial aspects of the semantics and syntax of potentials in Markov field models of computer vision, and constructs a framework within which these aspects may be consistently arranged and explained. Full article
Show Figures

Figure 1

52 pages, 18820 KB  
Article
Multimodal Industrial Scene Characterisation for Pouring Process Monitoring Using a Mixture of Experts
by Javier Nieves, Javier Selva, Guillermo Elejoste-Rementeria, Jorge Angulo-Pines, Jon Leiñena, Xuban Barberena and Fátima A. Saiz
Appl. Sci. 2026, 16(7), 3430; https://doi.org/10.3390/app16073430 - 1 Apr 2026
Viewed by 396
Abstract
Industrial pouring processes operate under highly dynamic conditions where small deviations can lead to defects, scrap, and production losses. Although modern foundries are equipped with multiple sensors and visual inspection systems, most monitoring approaches remain fragmented, unimodal, and difficult to interpret. Furthermore, annotated [...] Read more.
Industrial pouring processes operate under highly dynamic conditions where small deviations can lead to defects, scrap, and production losses. Although modern foundries are equipped with multiple sensors and visual inspection systems, most monitoring approaches remain fragmented, unimodal, and difficult to interpret. Furthermore, annotated anomalous samples in industrial settings are scarce, hindering the development of traditional methods. As a result, many critical pouring anomalies are detected too late or lack sufficient contextual information for effective decision making. In this work, we propose a multimodal framework for industrial scene characterisation that combines visual information and process signals through an explainable Mixture-of-Experts (MoE)-style expert-fusion strategy. First, we deploy an ensemble of specialised modules that collaborate to identify regions of interest, assess pouring quality, and contextualise events within the production process, thereby generating an interpretable description of pouring events. Second, we introduce a novel anomaly detection method for multimodal video data, combining a self-supervised transformer with an outlier-aware clustering algorithm. Our approach effectively identifies rare anomalies without requiring extensive manual labelling. The resulting information is structured into a digital twin-ready representation, supporting synchronisation between the physical system and its virtual counterpart. This solution provides a scalable, deployable pathway to transform heterogeneous industrial data into actionable knowledge, supporting advanced monitoring, anomaly detection, and quality control in real foundry environments. Full article
Show Figures

Figure 1

12 pages, 2384 KB  
Article
Image Processing Technology Applied to Fluorescent Rapid Tests for Influenza A and B Viruses
by Yu-Lin Wu, Wei-Chien Weng, Wen-Fung Pan and Yu-Cheng Lin
Appl. Sci. 2025, 15(21), 11523; https://doi.org/10.3390/app152111523 - 28 Oct 2025
Viewed by 771
Abstract
This study establishes a detection method based on image recognition to interpret and quantitatively analyze fluorescent rapid test kits for influenza. The method operates in a dark chamber equipped with a UV-LED, where the fluorescence of the test kit is excited by the [...] Read more.
This study establishes a detection method based on image recognition to interpret and quantitatively analyze fluorescent rapid test kits for influenza. The method operates in a dark chamber equipped with a UV-LED, where the fluorescence of the test kit is excited by the UV-LED and subsequently captured using a camera module. The captured images are processed by segmenting the regions of interest (ROI), converting them to grayscale images, and analyzing the grayscale value distributions to identify the control (C) and test (T) line regions. By comparing the values of the C and T lines, the concentration is determined to achieve quantitative analysis. In the linearity validation experiments, the concentrations of influenza A (H1N1) specimens are 2, 4, 6, 8, and 10 ng/mL, achieving a coefficient of determination (R2) of 0.9923. For influenza B (Yamagata) specimens, concentrations of 6, 8, 10, 12.5, and 25 ng/mL resulted in an R2 of 0.9878. The established method enables the detection of both influenza A (H1N1) and influenza B (Yamagata), replacing visual qualitative interpretation with quantitative analysis. Currently, the detection method developed in this paper is designed for use exclusively in a dark chamber and is specifically applied to fluorescent rapid tests. It cannot be directly used with conventional colloidal gold-based rapid test reagents. In the future, the proposed detection approach could be integrated with neural networks to enable its application to non-fluorescent rapid test interpretation and to operate beyond the dark chamber environment, for example by utilizing smartphone imaging for result interpretation under normal lighting conditions. Full article
Show Figures

Figure 1

22 pages, 17900 KB  
Article
Custom Material Scanning System for PBR Texture Acquisition: Hardware Design and Digitisation Workflow
by Lunan Wu, Federico Morosi and Giandomenico Caruso
Appl. Sci. 2025, 15(20), 10911; https://doi.org/10.3390/app152010911 - 11 Oct 2025
Cited by 1 | Viewed by 1956
Abstract
Real-time rendering is increasingly used in augmented and virtual reality (AR/VR), interactive design, and product visualisation, where materials must prioritise efficiency and consistency rather than the extreme accuracy required in offline rendering. In parallel, the growing demand for personalised and customised products has [...] Read more.
Real-time rendering is increasingly used in augmented and virtual reality (AR/VR), interactive design, and product visualisation, where materials must prioritise efficiency and consistency rather than the extreme accuracy required in offline rendering. In parallel, the growing demand for personalised and customised products has created a need for digital materials that can be generated in-house without relying on expensive commercial systems. To address these requirements, this paper presents a low-cost digitisation workflow based on photometric stereo. The system integrates a custom-built scanner with cross-polarised illumination, automated multi-light image acquisition, a dual-stage colour calibration process, and a node-based reconstruction pipeline that produces albedo and normal maps. A reproducible evaluation methodology is also introduced, combining perceptual colour-difference analysis using the CIEDE2000 (ΔE00) metric with angular-error assessment of normal maps on known-geometry samples. By openly providing the workflow, bill of materials, and implementation details, this work delivers a practical and replicable solution for reliable material capture in real-time rendering and product customisation scenarios. Full article
Show Figures

Figure 1

23 pages, 2351 KB  
Article
Ensemble of Efficient Vision Transformers for Insect Classification
by Marius Alexandru Dinca, Dan Popescu, Loretta Ichim and Nicoleta Angelescu
Appl. Sci. 2025, 15(13), 7610; https://doi.org/10.3390/app15137610 - 7 Jul 2025
Cited by 3 | Viewed by 2314
Abstract
Real-time identification of insect pests is an important research direction in modern agricultural management, directly influencing crop health and yield. Recent advances in computer vision and deep learning, especially vision transformer (ViT) architectures, have demonstrated great potential in addressing this challenge. The present [...] Read more.
Real-time identification of insect pests is an important research direction in modern agricultural management, directly influencing crop health and yield. Recent advances in computer vision and deep learning, especially vision transformer (ViT) architectures, have demonstrated great potential in addressing this challenge. The present study explores the possibility of combining some ViT models for the insect pest classification task to improve system performance and robustness. Two popular and widely known datasets, D0 and IP102, which consist of diverse digital images with complex contexts of insect pests, were used. The proposed methodology involved training several individual ViT models on the chosen datasets, finally creating an ensemble strategy to fuse their results. A new combination method was used, based on the F1 score of individual models and a meta-classifier structure, capitalizing on the strengths of each base model and effectively capturing complex features for the final prediction. The experimental results indicated that the proposed ensemble methodology significantly outperformed the individual ViT models, observing notable improvements in classification accuracy for both datasets. Specifically, the ensemble model achieved a test accuracy of 99.87% and an F1 score of 99.82% for the D0 dataset, and an F1 score of 84.25% for IP102, demonstrating the method’s effectiveness for insect pest classification from different datasets. The noted features pave the way for implementing reliable and effective solutions in the agricultural pest management process. Full article
Show Figures

Figure 1

Review

Jump to: Research, Other

17 pages, 43598 KB  
Review
Body Measurements for Digital Forensic Comparison of Individuals—An Overview of Current Developments
by Sabine Richter and Dirk Labudde
Appl. Sci. 2025, 15(23), 12518; https://doi.org/10.3390/app152312518 - 25 Nov 2025
Viewed by 1234
Abstract
Forensic identification of individuals faces significant challenges, particularly when conventional biometric features such as the face are hidden. This paper examines the historical development and revival of body patterns (anthropometric rig) as biometric comparison feature, from historical Bertillonage to modern, computer-assisted methods such [...] Read more.
Forensic identification of individuals faces significant challenges, particularly when conventional biometric features such as the face are hidden. This paper examines the historical development and revival of body patterns (anthropometric rig) as biometric comparison feature, from historical Bertillonage to modern, computer-assisted methods such as digital anthropometric rig matching and the connection to 3D human pose estimation (HPE). It highlights both the mathematical and methodological foundations of this revival, as well as the potential and limitations of applying artificial intelligence (AI) in the context of digital anthropometric rig matching. The aim is to highlight the development of potential and challenges for the forensic validity of the person-specific digital skeleton. This clearly shows the time required for manual work, which underlines the need for automation. The time required can be reduced by approaches that use AI. However, these methods are often not yet up to the requirements in a forensic context. Full article
Show Figures

Figure 1

Other

Jump to: Research, Review

41 pages, 1927 KB  
Systematic Review
Advancements in Small-Object Detection (2023–2025): Approaches, Datasets, Benchmarks, Applications, and Practical Guidance
by Ali Aldubaikhi and Sarosh Patel
Appl. Sci. 2025, 15(22), 11882; https://doi.org/10.3390/app152211882 - 7 Nov 2025
Cited by 11 | Viewed by 13166
Abstract
Small-object detection (SOD) remains an important and growing challenge in computer vision and is the backbone of many applications, including autonomous vehicles, aerial surveillance, medical imaging, and industrial quality control. Small objects, in pixels, lose discriminative features during deep neural network processing, making [...] Read more.
Small-object detection (SOD) remains an important and growing challenge in computer vision and is the backbone of many applications, including autonomous vehicles, aerial surveillance, medical imaging, and industrial quality control. Small objects, in pixels, lose discriminative features during deep neural network processing, making them difficult to disentangle from background noise and other artifacts. This survey presents a comprehensive and systematic review of the SOD advancements between 2023 and 2025, a period marked by the maturation of transformer-based architectures and a return to efficient, realistic deployment. We applied the PRISMA methodology for this work, yielding 112 seminal works in the field to ensure the robustness of our foundation for this study. We present a critical taxonomy of the developments since 2023, arranged in five categories: (1) multiscale feature learning; (2) transformer-based architectures; (3) context-aware methods; (4) data augmentation enhancements; and (5) advancements to mainstream detectors (e.g., YOLO). Third, we describe and analyze the evolving SOD-centered datasets and benchmarks and establish the importance of evaluating models fairly. Fourth, we contribute a comparative assessment of state-of-the-art models, evaluating not only accuracy (e.g., the average precision for small objects (AP_S)) but also important efficiency (FPS, latency, parameters, GFLOPS) metrics across standardized hardware platforms, including edge devices. We further use data-driven case studies in the remote sensing, manufacturing, and healthcare domains to create a bridge between academic benchmarks and real-world performance. Finally, we summarize practical guidance for practitioners, the model selection decision matrix, scenario-based playbooks, and the deployment checklist. The goal of this work is to help synthesize the recent progress, identify the primary limitations in SOD, and open research directions, including the potential future role of generative AI and foundational models, to address the long-standing data and feature representation challenges that have limited SOD. Full article
Show Figures

Figure 1

Back to TopTop