Image Analysis and Processing

A special issue of Technologies (ISSN 2227-7080). This special issue belongs to the section "Information and Communication Technologies".

Deadline for manuscript submissions: closed (31 December 2025) | Viewed by 27081

Special Issue Editors

School of Electrical and Information Engineering, Wuhan Institute of Technology, Wuhan 430205, China
Interests: artificial intelligence and Internet of Things

E-Mail Website
Guest Editor
School of Electrical and Information Engineering, Wuhan Institute of Technology, Wuhan 430205, China
Interests: robot theory and algorithms; multimodal perception and learning; Lie groups; Lie algebra

E-Mail Website
Guest Editor
School of Electrical and Information Engineering, Wuhan Institute of Technology, Wuhan 430205, China
Interests: computer vision; image analysis and processing; deep learning

E-Mail Website
Guest Editor
School of Electrical and Information Engineering, Wuhan Institute of Technology, Wuhan 430205, China
Interests: computer vision; image analysis and processing; deep learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Image analysis and processing is an important branch in the field of computer vision and artificial intelligence, which focuses on how to use computer technology to efficiently process and analyze images in order to extract useful information and meet the needs of specific applications. This field involves image preprocessing, feature extraction, image segmentation, target recognition, image reconstruction, and other aspects. Through the use of mathematics, physics, statistics, and other methods, image analysis and processing technology plays an important role in medical diagnosis, remote sensing monitoring, security monitoring, industrial automation, digital entertainment, and other fields, which brings convenience to human life and, at the same time, promotes the development of related disciplines.

This Special Issue focuses on some of the recent developments in computer vision, artificial intelligence, image preprocessing, feature extraction, image segmentation, target recognition, and image reconstruction.

Potential topics of this Special Issue include but are not limited to the following:

  • Multimodal image denoising and enhancement;
  • Multimodal image fusion;
  • Image classification and semantic segmentation;
  • Object detection and segmentation;
  • Robot dynamics and control;
  • Human–robot interaction;
  • Robot learning;
  • AI for teaching.

Dr. Xi Li
Dr. Zhongtao Fu
Dr. Yu Shi
Dr. Zhenghua Huang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Technologies is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • artificial intelligence
  • image preprocessing
  • feature extraction
  • image segmentation
  • target recognition
  • image reconstruction

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

25 pages, 233246 KB  
Article
Seamlessly Natural: Image Stitching with Natural Appearance Preservation
by Gaetane Lorna N. Tchana, Damaris Belle M. Fotso, Antonio Hendricks and Christophe Bobda
Technologies 2026, 14(3), 186; https://doi.org/10.3390/technologies14030186 - 19 Mar 2026
Viewed by 360
Abstract
Conventional image stitching pipelines predominantly rely on homographic alignment, whose planar assumption often breaks down in dual-camera configurations capturing non-planar scenes, producing geometric warping, bulging, and structural distortion. To address these limitations, this paper presents SENA (Seamlessly Natural), a geometry-driven image stitching approach [...] Read more.
Conventional image stitching pipelines predominantly rely on homographic alignment, whose planar assumption often breaks down in dual-camera configurations capturing non-planar scenes, producing geometric warping, bulging, and structural distortion. To address these limitations, this paper presents SENA (Seamlessly Natural), a geometry-driven image stitching approach with three complementary contributions. First, we propose a hierarchical affine-based warping strategy that combines global affine initialization, local affine refinement, and a smooth free-form deformation field regulated by seamguard adaptive smoothing. This multi-scale design preserves local shape, parallelism, and aspect ratios, thereby reducing the hallucinated distortions commonly associated with homography-based models. Second, SENA incorporates a geometry-driven adequate zone detection mechanism that identifies regions with reduced parallax directly from the disparity consistency of correspondences filtered by RANSAC, without relying on semantic segmentation or depth estimation. Third, within this zone, anchor-based seamline cutting and segmentation enforce one-to-one geometric correspondence between image pairs, reducing ghosting and smearing artifacts. Extensive experiments demonstrate that SENA achieves 26.2 dB PSNR and 0.84 SSIM, obtains the lowest BRISQUE score (33.4) among compared methods, and reduces runtime by 79% on average across resolutions. These results confirm improved structural fidelity and computational efficiency while maintaining competitive alignment accuracy. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Figure 1

28 pages, 11222 KB  
Article
Robustness Enhancement of Self-Localization for Drone-View Mixed Reality via Adaptive RGB-Thermal Integration
by Ryuto Fukuda and Tomohiro Fukuda
Technologies 2026, 14(1), 74; https://doi.org/10.3390/technologies14010074 - 22 Jan 2026
Viewed by 711
Abstract
Drone-view mixed reality (MR) in the Architecture, Engineering, and Construction (AEC) sector faces significant self-localization challenges in low-texture environments, such as bare concrete sites. This study proposes an adaptive sensor fusion framework integrating thermal and visible light (RGB) imagery to enhance tracking robustness [...] Read more.
Drone-view mixed reality (MR) in the Architecture, Engineering, and Construction (AEC) sector faces significant self-localization challenges in low-texture environments, such as bare concrete sites. This study proposes an adaptive sensor fusion framework integrating thermal and visible light (RGB) imagery to enhance tracking robustness for diverse site applications. We introduce the Effective Inlier Count (Neff) as a lightweight gating mechanism to evaluate the spatial quality of feature points and dynamically weigh sensor modalities in real-time. By employing a 20×16 grid-based spatial filtering algorithm, the system effectively suppresses the influence of geometric burstiness without significant computational overhead on server-side processing. Validation experiments across various real-world scenarios demonstrate that the proposed method maintains high geometric registration accuracy where traditional RGB-only methods fail. In texture-less and specular conditions, the system consistently maintained an average Intersection over Union (IoU) above 0.72, while the baseline suffered from complete tracking loss or significant drift. These results confirm that thermal-RGB integration ensures operational availability and improves long-term stability by mitigating modality-specific noise. This approach offers a reliable solution for various drone-based AEC tasks, particularly in GPS-denied or adverse environments. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Graphical abstract

13 pages, 961 KB  
Communication
Impact of Background Removal on Cow Identification with Convolutional Neural Networks
by Gergana Balieva, Alexander Marazov, Dimitar Tanchev, Ivanka Lazarova and Ralitsa Rankova
Technologies 2026, 14(1), 50; https://doi.org/10.3390/technologies14010050 - 9 Jan 2026
Cited by 1 | Viewed by 510
Abstract
Individual animal identification is a cornerstone of animal welfare practices and is of crucial importance for food safety and the protection of humans from zoonotic diseases. It is also a key prerequisite for enabling automated processes in modern dairy farming. With newly emerging [...] Read more.
Individual animal identification is a cornerstone of animal welfare practices and is of crucial importance for food safety and the protection of humans from zoonotic diseases. It is also a key prerequisite for enabling automated processes in modern dairy farming. With newly emerging technologies, visual animal identification based on machine learning offers a more efficient and non-invasive method with high automation potential, accuracy, and practical applicability. However, a common challenge is the limited variability of training datasets, as images are typically captured in controlled environments with uniform backgrounds and fixed poses. This study investigates the impact of foreground segmentation and background removal on the performance of convolutional neural networks (CNNs) for cow identification. A dataset was created in which training images of dairy cows exhibited low variability in pose and background for each individual, whereas the test dataset introduced significant variation in both pose and environment. Both a fine-tuned CNN backbone and a model trained from scratch were evaluated using images with and without background information. The results demonstrate that although training on segmented foregrounds extracts intrinsic biometric features, background cues carry more information for individual recognition. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Figure 1

27 pages, 4631 KB  
Article
Multimodal Minimal-Angular-Geometry Representation for Real-Time Dynamic Mexican Sign Language Recognition
by Gerardo Garcia-Gil, Gabriela del Carmen López-Armas and Yahir Emmanuel Ramirez-Pulido
Technologies 2026, 14(1), 48; https://doi.org/10.3390/technologies14010048 - 8 Jan 2026
Viewed by 718
Abstract
Current approaches to dynamic sign language recognition commonly rely on dense landmark representations, which impose high computational cost and hinder real-time deployment on resource-constrained devices. To address this limitation, this work proposes a computationally efficient framework for real-time dynamic Mexican Sign Language (MSL) [...] Read more.
Current approaches to dynamic sign language recognition commonly rely on dense landmark representations, which impose high computational cost and hinder real-time deployment on resource-constrained devices. To address this limitation, this work proposes a computationally efficient framework for real-time dynamic Mexican Sign Language (MSL) recognition based on a multimodal minimal angular-geometry representation. Instead of processing complete landmark sets (e.g., MediaPipe Holistic with up to 468 keypoints), the proposed method encodes the relational geometry of the hands, face, and upper body into a compact set of 28 invariant internal angular descriptors. This representation substantially reduces feature dimensionality and computational complexity while preserving linguistically relevant manual and non-manual information required for grammatical and semantic discrimination in MSL. A real-time end-to-end pipeline is developed, comprising multimodal landmark extraction, angular feature computation, and temporal modeling using a Bidirectional Long Short-Term Memory (BiLSTM) network. The system is evaluated on a custom dataset of dynamic MSL gestures acquired under controlled real-time conditions. Experimental results demonstrate that the proposed approach achieves 99% accuracy and 99% macro F1-score, matching state-of-the-art performance while using fewer features dramatically. The compactness, interpretability, and efficiency of the minimal angular descriptor make the proposed system suitable for real-time deployment on low-cost devices, contributing toward more accessible and inclusive sign language recognition technologies. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Figure 1

16 pages, 1956 KB  
Article
Post Hoc Error Correction for Missing Classes in Deep Neural Networks
by Andrey A. Lebedev, Victor B. Kazantsev and Sergey V. Stasenko
Technologies 2026, 14(1), 8; https://doi.org/10.3390/technologies14010008 - 22 Dec 2025
Viewed by 574
Abstract
This paper presents a novel post hoc error correction method that enables deep neural networks to recognize classes that were completely excluded during training. Unlike traditional approaches requiring full model retraining, our method uses hidden layer representations from any pre-trained classifier to detect [...] Read more.
This paper presents a novel post hoc error correction method that enables deep neural networks to recognize classes that were completely excluded during training. Unlike traditional approaches requiring full model retraining, our method uses hidden layer representations from any pre-trained classifier to detect and correct errors on missing categories. We demonstrate the approach on facial emotion recognition using the RAF-DB dataset, systematically excluding each of the seven emotion classes from training. The results show correction gains of up to 0.811 for excluded classes while maintaining 99% retention on known classes in the best setup. The method provides a computationally efficient alternative to retraining when new categories emerge after deployment. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Figure 1

19 pages, 444 KB  
Article
Enhancing Cascade Object Detection Accuracy Using Correctors Based on High-Dimensional Feature Separation
by Andrey V. Kovalchuk, Andrey A. Lebedev, Olga V. Shemagina, Irina V. Nuidel, Vladimir G. Yakhno and Sergey V. Stasenko
Technologies 2025, 13(12), 593; https://doi.org/10.3390/technologies13120593 - 16 Dec 2025
Cited by 2 | Viewed by 588
Abstract
This study addresses the problem of correcting systematic errors in classical cascade object detectors under severe data scarcity and distribution shift. We focus on the widely used Viola–Jones framework enhanced with a modified Census transform and propose a modular “corrector” architecture that can [...] Read more.
This study addresses the problem of correcting systematic errors in classical cascade object detectors under severe data scarcity and distribution shift. We focus on the widely used Viola–Jones framework enhanced with a modified Census transform and propose a modular “corrector” architecture that can be attached to an existing detector without retraining it. The key idea is to exploit the blessing of dimensionality: high-dimensional feature vectors constructed from multiple cascade stages are transformed by PCA and whitening into a space where simple linear Fisher discriminants can reliably separate rare error patterns from normal operation using only a few labeled examples. This study presents a novel algorithm designed to correct the outputs of object detectors constructed using the Viola–Jones framework enhanced with a modified census transform. The proposed method introduces several improvements addressing error correction and robustness in data-limited conditions. The approach involves image partitioning through a sliding window of fixed aspect ratio and a modified census transform in which pixel intensity is compared to the mean value within a rectangular neighborhood. Training samples for false negative and false positive correctors are selected using dual Intersection-over-Union (IoU) thresholds and probabilistic sampling of true positive and true negative fragments. Corrector models are trained based on the principles of high-dimensional separability within the paradigm of one- and few-shot learning, utilizing features derived from cascade stages of the detector. Decision boundaries are optimized using Fisher’s rule, with adaptive thresholding to guarantee zero false acceptance. Experimental results indicate that the proposed correction scheme enhances object detection accuracy by effectively compensating for classifier errors, particularly under conditions of scarce training data. On two railway image datasets with only about one thousand images each, the proposed correctors increase Precision from 0.36 to 0.65 on identifier detection while maintaining high Recall (0.98 → 0.94), and improve digit detection Recall from 0.94 to 0.98 with negligible loss in Precision (0.92 → 0.91). These results demonstrate that even under scarce training data, high-dimensional feature separation enables effective one-/few-shot error correction for cascade detectors with minimal computational overhead. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Figure 1

19 pages, 2027 KB  
Article
Novel End-to-End CNN Approach for Fault Diagnosis in Electromechanical Systems Based on Relevant Heating Areas in Thermography
by Gilberto Alvarado-Robles, Angel Perez-Cruz, Isac Andres Espinosa-Vizcaino, Arturo Yosimar Jaen-Cuellar and Juan Jose Saucedo-Dorantes
Technologies 2025, 13(12), 551; https://doi.org/10.3390/technologies13120551 - 26 Nov 2025
Viewed by 930
Abstract
The reliability of electromechanical systems is a critical factor in modern Industry 4.0, as unexpected failures in induction motors or gearboxes can cause costly downtime, productivity losses, and increased maintenance demands. Infrared thermography offers a non-invasive and real-time means of monitoring thermal behavior, [...] Read more.
The reliability of electromechanical systems is a critical factor in modern Industry 4.0, as unexpected failures in induction motors or gearboxes can cause costly downtime, productivity losses, and increased maintenance demands. Infrared thermography offers a non-invasive and real-time means of monitoring thermal behavior, yet its effective use for fault diagnosis remains challenging due to sensitivity to noise, environmental variability, and the need for robust feature extraction. This work proposes a novel end-to-end convolutional neural network (CNN) methodology for detecting and classifying faults in electromechanical systems through the processing of infrared thermography images. The method integrates an automatic preprocessing stage that isolates the Relevant Heating Areas (RHAs), preserving their geometric and thermal descriptors while filtering irrelevant background information. A tailored data augmentation strategy, including controlled noise injection, was designed to improve robustness under realistic acquisition conditions. The CNN architecture combines 3 × 3 and 5 × 5 kernels to capture both fine-grained and global heating patterns. Experimental validation is carried out under nine different faulty conditions, achieving 99.7% accuracy and demonstrating strong resilience against Gaussian blur and additive Gaussian noise. The results suggest that the method provides a scalable, interpretable, and efficient approach for fault diagnosis in electromechanical systems within Industry 4.0 environments. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Figure 1

27 pages, 1220 KB  
Article
Robust Supervised Deep Discrete Hashing for Cross-Modal Retrieval
by Xiwei Dong, Fei Wu, Junqiu Zhai, Fei Ma, Guangxing Wang, Tao Liu, Xiaogang Dong and Xiao-Yuan Jing
Technologies 2025, 13(9), 383; https://doi.org/10.3390/technologies13090383 - 29 Aug 2025
Viewed by 1320
Abstract
The exponential growth of multi-modal data in the real world poses significant challenges to efficient retrieval, and traditional single-modal methods are no longer suitable for the growth of multi-modal data. To address this issue, hashing retrieval methods play an important role in cross-modal [...] Read more.
The exponential growth of multi-modal data in the real world poses significant challenges to efficient retrieval, and traditional single-modal methods are no longer suitable for the growth of multi-modal data. To address this issue, hashing retrieval methods play an important role in cross-modal retrieval tasks when referring to a large amount of multi-modal data. However, effectively embedding multi-modal data into a common low-dimensional Hamming space remains challenging. A critical issue is that feature redundancies in existing methods lead to suboptimal hash codes, severely degrading retrieval performance; yet, selecting optimal features remains an open problem in deep cross-modal hashing. In this paper, we propose an end-to-end approach, named Robust Supervised Deep Discrete Hashing (RSDDH), which can accomplish feature learning and hashing learning simultaneously. RSDDH has a hybrid deep architecture consisting of a convolutional neural network and a multilayer perceptron adaptively learning modality-specific representations. Moreover, it utilizes a non-redundant feature selection strategy to select optimal features for generating discriminative hash codes. Furthermore, it employs a direct discrete hashing scheme (SVDDH) to solve the binary constraint optimization problem without relaxation, fully preserving the intrinsic properties of hash codes. Additionally, RSDDH employs inter-modal and intra-modal consistency preservation strategies to reduce the gap between modalities and improve the discriminability of learned Hamming space. Extensive experiments on four benchmark datasets demonstrate that RSDDH significantly outperforms state-of-the-art cross-modal hashing methods. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Figure 1

15 pages, 7157 KB  
Article
RADAR: Reasoning AI-Generated Image Detection for Semantic Fakes
by Haochen Wang, Xuhui Liu, Ziqian Lu, Cilin Yan, Xiaolong Jiang, Runqi Wang and Efstratios Gavves
Technologies 2025, 13(7), 280; https://doi.org/10.3390/technologies13070280 - 2 Jul 2025
Cited by 1 | Viewed by 3082
Abstract
As modern generative models advance rapidly, AI-generated images exhibit higher resolution and lifelike details. However, the generated images may not adhere to world knowledge and common sense, as there is no such awareness and supervision in the generative models. For instance, the generated [...] Read more.
As modern generative models advance rapidly, AI-generated images exhibit higher resolution and lifelike details. However, the generated images may not adhere to world knowledge and common sense, as there is no such awareness and supervision in the generative models. For instance, the generated images could feature a penguin walking in the desert or a man with three arms, scenarios that are highly unlikely to occur in real life. Current AI-generated image detection methods mainly focus on low-level features, such as detailed texture patterns and frequency domain inconsistency, which are specific to certain generative models, making it challenging to identify the above-mentioned general semantic fakes. In this work, (1) we propose a new task, reasoning AI-generated image detection, which focuses on identifying semantic fakes in generative images that violate world knowledge and common sense. (2) To benchmark the new task, we collect a new dataset Spot the Semantic Fake (STSF). STSF contains 358 images with clear semantic fakes generated by three different modern diffusion models and provides bounding boxes as well as text annotations to locate the fakes. (3) We propose RADAR, a reasoning AI-generated image detection assistor, to locate semantic fakes in the generative images and output corresponding text explanations. Specifically, RADAR contains a specialized multimodal LLM to process given images and detect semantic fakes. To improve the generalization ability, we further incorporate ChatGPT as an assistor to detect unrealistic components in grounded text descriptions. The experiments on the STSF dataset show that RADAR effectively detects semantic fakes in modern generative images. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Figure 1

16 pages, 5373 KB  
Article
Design and Development of an Electronic Interface for Acquiring Signals from a Piezoelectric Sensor for Ultrasound Imaging Applications
by Elizabeth Espitia-Romero, Adriana Guzmán-López, Micael Gerardo Bravo-Sánchez, Juan José Martínez-Nolasco, José Alfredo Padilla Medina and Francisco Villaseñor-Ortega
Technologies 2025, 13(7), 270; https://doi.org/10.3390/technologies13070270 - 25 Jun 2025
Viewed by 2420
Abstract
The increasing demand for accurate and accessible medical imaging has driven efforts to develop technologies that overcome limitations associated with conventional imaging techniques, such as MRI and CT scans. This study presents the design and implementation of an electronic interface for acquiring signals [...] Read more.
The increasing demand for accurate and accessible medical imaging has driven efforts to develop technologies that overcome limitations associated with conventional imaging techniques, such as MRI and CT scans. This study presents the design and implementation of an electronic interface for acquiring signals from a piezoelectric ultrasound sensor with the aim of improving image reconstruction quality by addressing electromagnetic interference and speckle noise, two major factors that degrade image fidelity. The proposed interface is installed between the ultrasound transducer and acquisition system, allowing real-time signal capture without altering the medical equipment’s operation. Using a printed circuit board with 110-pin connectors, signals from individual piezoelectric elements were analyzed using an oscilloscope. Results show that noise amplitudes occasionally exceed those of the acoustic echoes, potentially compromising image quality. By enabling direct observation of these signals, the interface facilitates the future development of analog filtering solutions to mitigate high-frequency noise before digital processing. This approach reduces reliance on computationally expensive digital filtering, offering a low-cost, real-time alternative. The findings underscore the potential of the interface to enhance diagnostic accuracy and support further innovation in medical imaging technologies. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Graphical abstract

19 pages, 16547 KB  
Article
A New Method for Camera Auto White Balance for Portrait
by Sicong Zhou, Kaida Xiao, Changjun Li, Peihua Lai, Hong Luo and Wenjun Sun
Technologies 2025, 13(6), 232; https://doi.org/10.3390/technologies13060232 - 5 Jun 2025
Cited by 1 | Viewed by 4459
Abstract
Accurate skin color reproduction under varying CCT remains a critical challenge in the graphic arts, impacting applications such as face recognition, portrait photography, and human–computer interaction. Traditional AWB methods like gray-world or max-RGB often rely on statistical assumptions, which limit their accuracy under [...] Read more.
Accurate skin color reproduction under varying CCT remains a critical challenge in the graphic arts, impacting applications such as face recognition, portrait photography, and human–computer interaction. Traditional AWB methods like gray-world or max-RGB often rely on statistical assumptions, which limit their accuracy under complex or extreme lighting. We propose SCR-AWB, a novel algorithm that leverages real skin reflectance data to estimate the scene illuminant’s SPD and CCT, enabling accurate skin tone reproduction. The method integrates prior knowledge of human skin reflectance, basis vectors, and camera sensitivity to perform pixel-wise spectral estimation. Experimental results on difficult skin color reproduction task demonstrate that SCR-AWB significantly outperforms traditional AWB algorithms. It achieves lower reproduction angle errors and more accurate CCT predictions, with deviations below 300 K in most cases. These findings validate SCR-AWB as an effective and computationally efficient solution for robust skin color correction. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Figure 1

21 pages, 8188 KB  
Article
New Approach to Dominant and Prominent Color Extraction in Images with a Wide Range of Hues
by Yurii Kynash and Mariia Semeniv
Technologies 2025, 13(6), 230; https://doi.org/10.3390/technologies13060230 - 4 Jun 2025
Cited by 1 | Viewed by 4676
Abstract
Dominant colors significantly influence visual image perception and are widely used in computer vision and design. Traditional extraction methods often neglect visually salient colors that occupy small areas yet possess high aesthetic relevance. This study introduces a method for detecting both dominant and [...] Read more.
Dominant colors significantly influence visual image perception and are widely used in computer vision and design. Traditional extraction methods often neglect visually salient colors that occupy small areas yet possess high aesthetic relevance. This study introduces a method for detecting both dominant and visually prominent colors in a wide range of hues and images. We analyzed the color gamut of images in the CIE L*a*b* color space and concluded that it is difficult to identify the dominant and prominent colors due to high color variability. To address these challenges, the proposed approach transforms images into the orthogonal ICaS color space, integrating the properties of RGB and CMYK models, followed by K-means clustering. A spectral residual saliency map is applied to exclude background regions and emphasize perceptually significant objects. Experimental evaluation on an image database shows that the proposed method yields color palettes with broader gamut coverage, preserved luminance, and visually balanced combinations. A comparative analysis was conducted using the ΔE00 metric, which accounts not only for differences in lightness, chroma, and hue but also for the perceptual interactions between colors, based on their proximity in the color space. The results confirm that the proposed method exhibits greater color stability and aesthetic coherence than existing approaches. These findings highlight the effectiveness of the orthogonal saliency mean method for delivering a more perceptually accurate and visually consistent representation of the dominant colors in an image. This outcome validates the method’s applicability for image analysis and design. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Graphical abstract

23 pages, 5095 KB  
Article
Human-Machine Interaction: A Vision-Based Approach for Controlling a Robotic Hand Through Human Hand Movements
by Gerardo García-Gil, Gabriela del Carmen López-Armas and José de Jesús Navarro, Jr.
Technologies 2025, 13(5), 169; https://doi.org/10.3390/technologies13050169 - 23 Apr 2025
Cited by 5 | Viewed by 3441
Abstract
An anthropomorphic robot is a mechanical device designed to perform human-like tasks, such as manipulating objects, and has been one of the significant contributions in robotics over the past 60 years. This paper presents an advanced system for controlling a robotic arm using [...] Read more.
An anthropomorphic robot is a mechanical device designed to perform human-like tasks, such as manipulating objects, and has been one of the significant contributions in robotics over the past 60 years. This paper presents an advanced system for controlling a robotic arm using user hand gestures and movements. It eliminates the need for traditional sensors or physical controls by implementing an intuitive approach based on MediaPipe and computer vision. The system recognizes the user’s hand movements. It translates them into commands that are sent to a microcontroller, which operates a robotic hand equipped with six servomotors: five for the fingers and one for the wrist, which stands out for its orthonormal design that avoids occlusion problems in turns of up to 180°, guaranteeing precise wrist control. Unlike conventional systems, this approach uses only a 2D camera to capture movements, simplifying design and reducing costs. The proposed system allows replicating the user’s activity with high precision, expanding the possibilities of human-robot interaction. Notably, the system has been able to replicate the user’s hand gestures with an accuracy of up to 95%. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Graphical abstract

Other

Jump to: Research

22 pages, 574 KB  
Systematic Review
Measurement Error of Markerless Motion Capture Systems Applied to Tracking Movements in Human–Object Interaction Tasks: A Systematic Review with Best Evidence Synthesis
by Nicole Unsihuay, Rene F. Clavo and Luiz H. Palucci Vieira
Technologies 2026, 14(1), 28; https://doi.org/10.3390/technologies14010028 - 1 Jan 2026
Cited by 1 | Viewed by 1953
Abstract
This systematic review focused on the validity of markerless motion capture (MMC) systems used for human movement assessment during tasks that involve physical interaction with objects. Five electronic databases were searched until May 2025. Eligible studies (i) assessed the validity of an MMC [...] Read more.
This systematic review focused on the validity of markerless motion capture (MMC) systems used for human movement assessment during tasks that involve physical interaction with objects. Five electronic databases were searched until May 2025. Eligible studies (i) assessed the validity of an MMC system, (ii) required human participants to perform tasks that involved physical interaction with objects (e.g., lifts, carrying, gait with loads), (iii) employed a marker-based reference system, and (iv) reported at least one kinematic metric. Risk of bias was assessed using the SURE checklist. A best-evidence synthesis was conducted to classify the level of evidence across included studies. Fifteen studies met eligibility (median = 21 participants per study). In general, MMC systems presented good performance in capturing the waveforms related to movement (i.e., high associations with reference systems), but its level of precision (i.e., the magnitude of differences to the reference systems) still requires improvement regarding tasks involving human–object interactions. Most tasks analyzed were lifts, gait with load, squatting and reaching/manipulation, and technical gestures. There was strong evidence for the validity of MMC for implementation during lifting tasks. In summary, markerless motion capture (MMC) systems exhibit promising evidence of validity for some human–object interaction tasks, that is, especially when lifting as strong evidence was observed across studies on this type of task. In contrast, some evidence for tasks including gait under load, squatting, reaching, or touchscreen interaction is limited, moderate, or conflicting. Notwithstanding these limitations, most studies were observed to have moderate- to high-quality methodology. Additional research is required to optimize protocols to study the measurement error aspects of MMC under human–object interaction in real-world environments. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Figure 1

Back to TopTop