MDPI - Publisher of Open Access Journals

60 pages, 7000 KB

Open AccessArticle

Biometric Embedded Non-Blind Color Image Watermarking with Geometric Tamper Resistance via SIFT-ORB Keypoint Matching

by Swapnaneel Dhar, Riyanka Manna, Khaldi Amine and Aditya Kumar Sahu

Computers 2026, 15(5), 264; https://doi.org/10.3390/computers15050264 - 22 Apr 2026

This work introduces a non-blind watermarking framework for color images to address tamper detection, particularly under geometric transformations. The proposed scheme fuses two watermarks, a personal signature and a biometric fingerprint, into a unified composite watermark embedded into the chrominance component of the [...] Read more.

This work introduces a non-blind watermarking framework for color images to address tamper detection, particularly under geometric transformations. The proposed scheme fuses two watermarks, a personal signature and a biometric fingerprint, into a unified composite watermark embedded into the chrominance component of the cover image using a multi-level transform domain approach, discrete wavelet transforms (DWTs), discrete cosine transforms (DCTs), and singular value decomposition (SVD). By leveraging the rotation-invariant properties of scale-invariant feature transform (SIFT) and oriented FAST and rotated BRIEF (ORB) descriptors, the framework ensures robust tamper detection without requiring alignment, thus mitigating the limitations of conventional detection techniques vulnerable to transformation-induced tamper obfuscation (TITO). Extensive experimentation demonstrates that the method maintains high perceptual fidelity, achieving PSNR values ranging from 50 to 55 dB for embedding strength factor μ (0.01–0.04) and SSIM indices near 1 across multiple benchmark images. Furthermore, the scheme exhibits notable resilience to a range of image processing attacks and geometric distortion. Comparative evaluation reveals its superiority over existing grayscale, color, SIFT-based and DWT-DCT-SVD-based watermarking techniques, affirming its applicability in scenarios demanding secure, imperceptible, and transformation-invariant image watermarking. Full article

(This article belongs to the Special Issue Advanced Cryptographic Techniques for Digital Watermarking, Encryption, and Steganography)

23 pages, 13639 KB

Open AccessArticle

Making Animal Re-Identification Accessible: A Web-Based Giraffe ID System for Zoos

by Nipuna Lakshitha Saputhanthrige Don, Mitchell Rogers, Junhong Zhao, Bing Xue and Mengjie Zhang

Information 2026, 17(3), 266; https://doi.org/10.3390/info17030266 - 6 Mar 2026

Viewed by 504

Abstract

Computer vision and machine learning have accelerated the automation of animal re-identification pipelines used in conservation programs worldwide. For species with distinctive markings, such as the spot patterns of giraffes, these automated methods are crucial for research and population monitoring purposes. However, many [...] Read more.

Computer vision and machine learning have accelerated the automation of animal re-identification pipelines used in conservation programs worldwide. For species with distinctive markings, such as the spot patterns of giraffes, these automated methods are crucial for research and population monitoring purposes. However, many tools are designed for experts, and their implementation requires substantial technical expertise. Research teams often use specialist software and workflows that are not accessible to the general public. In a zoo setting, visitors lack a simple way to identify an individual animal, and unique features are easily missed by untrained visitors. This study presents a three-part solution: a web interface for zoo visitors to upload photos, a deep learning model for giraffe torso detection, and a fast re-identification method for matching observations to a gallery of known individuals using server-side processing. We compare several re-identification methods (RootSIFT, MiewID, and MegaDescriptor) using a consistent evaluation protocol and report both identification performance and system latency for this closed-set zoo setting. Taken together, this study presents a visitor-facing web system that integrates existing re-identification models into a modular, real-time pipeline for zoo deployment, lowering the barrier to visitor participation and making state-of-the-art re-identification methods more accessible to the general public. Full article

(This article belongs to the Special Issue Advances in Computer Graphics and Visual Computing)

► Show Figures

Graphical abstract

28 pages, 11762 KB

Open AccessArticle

A Coarse-to-Fine Optical-SAR Image Registration Algorithm for UAV-Based Multi-Sensor Systems Using Geographic Information Constraints and Cross-Modal Feature Consistency Mapping

by Xiaoyong Sun, Zhen Zuo, Xiaojun Guo, Xuan Li, Peida Zhou, Runze Guo and Shaojing Su

Remote Sens. 2026, 18(5), 683; https://doi.org/10.3390/rs18050683 - 25 Feb 2026

Viewed by 444

Abstract

Optical and synthetic aperture radar (SAR) image registration faces challenges from nonlinear radiometric distortions and geometric deformations caused by different imaging mechanisms. This paper proposes a coarse-to-fine registration algorithm integrating geographic information constraints with cross-modal feature consistency mapping. The coarse stage employs imaging [...] Read more.

Optical and synthetic aperture radar (SAR) image registration faces challenges from nonlinear radiometric distortions and geometric deformations caused by different imaging mechanisms. This paper proposes a coarse-to-fine registration algorithm integrating geographic information constraints with cross-modal feature consistency mapping. The coarse stage employs imaging geometry-based coordinate transformation with airborne navigation data to eliminate scale and rotation differences. The fine stage constructs a multi-scale phase congruency-based feature response aggregation model combined with rotation-invariant descriptors and global-to-local search for sub-pixel alignment. Experiments on integrated airborne optical/SAR datasets demonstrate superior performance with an average RMSE of 2.00 pixels, outperforming both traditional handcrafted methods (3MRS, OS-SIFT, POS-GIFT, GLS-MIFT) and state-of-the-art deep learning approaches (SuperGlue, LoFTR, ReDFeat, SAROptNet) while reducing execution time by 37.0% compared with the best-performing baseline. The proposed coarse registration also serves as an effective preprocessing module that improves SuperGlue’s matching rate by 167% and LoFTR’s by 109%, with a hybrid refinement strategy achieving 1.95 pixels RMSE. The method demonstrates robust performance under challenging conditions, enabling real-time UAV-based multi-sensor fusion applications. Full article

(This article belongs to the Special Issue Deep Learning-Based Analysis of High-Resolution Remote Sensing Images: Registration, Fusion, and Change Detection)

► Show Figures

Figure 1

19 pages, 2617 KB

Open AccessArticle

Topic-Modeling Guided Semantic Clustering for Enhancing CNN-Based Image Classification Using Scale-Invariant Feature Transform and Block Gabor Filtering

by Natthaphong Suthamno and Jessada Tanthanuch

J. Imaging 2026, 12(2), 70; https://doi.org/10.3390/jimaging12020070 - 9 Feb 2026

Viewed by 428

Abstract

This study proposes a topic-modeling guided framework that enhances image classification by introducing semantic clustering prior to CNN training. Images are processed through two key-point extraction pipelines: Scale-Invariant Feature Transform (SIFT) with Sobel edge detection and Block Gabor Filtering (BGF), to obtain local [...] Read more.

This study proposes a topic-modeling guided framework that enhances image classification by introducing semantic clustering prior to CNN training. Images are processed through two key-point extraction pipelines: Scale-Invariant Feature Transform (SIFT) with Sobel edge detection and Block Gabor Filtering (BGF), to obtain local feature descriptors. These descriptors are clustered using K-means to build a visual vocabulary. Bag of Words histograms then represent each image as a visual document. Latent Dirichlet Allocation is applied to uncover latent semantic topics, generating coherent image clusters. Cluster-specific CNN models, including AlexNet, GoogLeNet, and several ResNet variants, are trained under identical conditions to identify the most suitable architecture for each cluster. Two topic guided integration strategies, the Maximum Proportion Topic (MPT) and the Weight Proportion Topic (WPT), are then used to assign test images to the corresponding specialized model. Experimental results show that both the SIFT-based and BGF-based pipelines outperform non-clustered CNN models and a baseline method using Incremental PCA, K-means, Same-Cluster Prediction, and unweighted Ensemble Voting. The SIFT pipeline achieves the highest accuracy of 95.24% with the MPT strategy, while the BGF pipeline achieves 93.76% with the WPT strategy. These findings confirm that semantic structure introduced through topic modeling substantially improves CNN classification performance. Full article

(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)

► Show Figures

Figure 1

22 pages, 7096 KB

Open AccessArticle

An Improved ORB-KNN-Ratio Test Algorithm for Robust Underwater Image Stitching on Low-Cost Robotic Platforms

by Guanhua Yi, Tianxiang Zhang, Yunfei Chen and Dapeng Yu

J. Mar. Sci. Eng. 2026, 14(2), 218; https://doi.org/10.3390/jmse14020218 - 21 Jan 2026

Viewed by 510

Abstract

Underwater optical images often exhibit severe color distortion, weak texture, and uneven illumination due to light absorption and scattering in water. These issues result in unstable feature detection and inaccurate image registration. To address these challenges, this paper proposes an underwater image stitching [...] Read more.

Underwater optical images often exhibit severe color distortion, weak texture, and uneven illumination due to light absorption and scattering in water. These issues result in unstable feature detection and inaccurate image registration. To address these challenges, this paper proposes an underwater image stitching method that integrates ORB (Oriented FAST and Rotated BRIEF) feature extraction with a fixed-ratio constraint matching strategy. First, lightweight color and contrast enhancement techniques are employed to restore color balance and improve local texture visibility. Then, ORB descriptors are extracted and matched via a KNN (K-Nearest Neighbors) nearest-neighbor search, and Lowe’s ratio test is applied to eliminate false matches caused by weak texture similarity. Finally, the geometric transformation between image frames is estimated by incorporating robust optimization, ensuring stable homography computation. Experimental results on real underwater datasets show that the proposed method significantly improves stitching continuity and structural consistency, achieving 40–120% improvements in SSIM (Structural Similarity Index) and PSNR (peak signal-to-noise ratio) over conventional Harris–ORB + KNN, SIFT (scale-invariant feature transform) + BF (brute force), SIFT + KNN, and AKAZE (accelerated KAZE) + BF methods while maintaining processing times within one second. These results indicate that the proposed method is well-suited for real-time underwater environment perception and panoramic mapping on low-cost, micro-sized underwater robotic platforms. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

36 pages, 4464 KB

Open AccessArticle

Efficient Image-Based Memory Forensics for Fileless Malware Detection Using Texture Descriptors and LIME-Guided Deep Learning

by Qussai M. Yaseen, Esraa Oudat, Monther Aldwairi and Salam Fraihat

Computers 2025, 14(11), 467; https://doi.org/10.3390/computers14110467 - 1 Nov 2025

Cited by 1 | Viewed by 2067

Abstract

Memory forensics is an essential cybersecurity tool that comprehensively examines volatile memory to detect the malicious activity of fileless malware that can bypass disk analysis. Image-based detection techniques provide a promising solution by visualizing memory data into images to be used and analyzed [...] Read more.

Memory forensics is an essential cybersecurity tool that comprehensively examines volatile memory to detect the malicious activity of fileless malware that can bypass disk analysis. Image-based detection techniques provide a promising solution by visualizing memory data into images to be used and analyzed by image processing tools and machine learning methods. However, the effectiveness of image-based data for detection and classification requires high computational efforts. This paper investigates the efficacy of texture-based methods in detecting and classifying memory-resident or fileless malware using different image resolutions, identifying the best feature descriptors, classifiers, and resolutions that accurately classify malware into specific families and differentiate them from benign software. Moreover, this paper uses both local and global descriptors, where local descriptors include Oriented FAST and Rotated BRIEF (ORB), Scale-Invariant Feature Transform (SIFT), and Histogram of Oriented Gradients (HOG) and global descriptors include Discrete Wavelet Transform (DWT), GIST, and Gray Level Co-occurrence Matrix (GLCM). The results indicate that as image resolution increases, most feature descriptors yield more discriminative features but require higher computational efforts in terms of time and processing resources. To address this challenge, this paper proposes a novel approach that integrates Local Interpretable Model-agnostic Explanations (LIME) with deep learning models to automatically identify and crop the most important regions of memory images. The LIME’s ROI was extracted based on ResNet50 and MobileNet models’ predictions separately, the images were resized to 128 × 128, and the sampling process was performed dynamically to speed up LIME computation. The ROIs of the images are cropped to new images with sizes of (100 × 100) in two stages: the coarse stage and the fine stage. The two generated LIME-based cropped images using ResNet50 and MobileNet are fed to the lightweight neural network to evaluate the effectiveness of the LIME-based identified regions. The results demonstrate that the LIME-based MobileNet model’s prediction improves the efficiency of the model by preserving important features with a classification accuracy of 85% on multi-class classification. Full article

(This article belongs to the Special Issue Using New Technologies in Cyber Security Solutions (2nd Edition))

► Show Figures

Figure 1

11 pages, 1013 KB

Open AccessProceeding Paper

A Comparative Evaluation of Classical and Deep Learning-Based Visual Odometry Methods for Autonomous Vehicle Navigation

by Armand Nagy and János Hollósi

Eng. Proc. 2025, 113(1), 16; https://doi.org/10.3390/engproc2025113016 - 29 Oct 2025

Viewed by 1405

Abstract

This study introduces a comprehensive benchmarking framework for evaluating visual odometry (VO) methods, combining classical, learning-based, and hybrid approaches. We assess 52 configurations—spanning 19 keypoint detectors, 21 descriptors, and 4 matchers—across two widely used benchmark datasets: KITTI and EuRoC. Six key trajectory metrics, [...] Read more.

This study introduces a comprehensive benchmarking framework for evaluating visual odometry (VO) methods, combining classical, learning-based, and hybrid approaches. We assess 52 configurations—spanning 19 keypoint detectors, 21 descriptors, and 4 matchers—across two widely used benchmark datasets: KITTI and EuRoC. Six key trajectory metrics, including Absolute Trajectory Error (ATE) and Final Displacement Error (FDE), provide a detailed performance comparison under various environmental conditions, such as motion blur, occlusions, and dynamic lighting. Our results highlight the critical role of feature matchers, with the LightGlue–SIFT combination consistently outperforming others across both datasets. Additionally, learning-based matchers can be integrated with classical pipelines, improving robustness without requiring end-to-end training. Hybrid configurations combining classical detectors with learned components offer a balanced trade-off between accuracy, robustness, and computational efficiency, making them suitable for real-world applications in autonomous systems and robotics. Full article

(This article belongs to the Proceedings of The Sustainable Mobility and Transportation Symposium 2025)

► Show Figures

Figure 1

22 pages, 4258 KB

Open AccessArticle

Visible Image-Based Machine Learning for Identifying Abiotic Stress in Sugar Beet Crops

by Seyed Reza Haddadi, Masoumeh Hashemi, Richard C. Peralta and Masoud Soltani

Algorithms 2025, 18(11), 680; https://doi.org/10.3390/a18110680 - 24 Oct 2025

Cited by 4 | Viewed by 1020

Abstract

Previous researches have proved that the synchronized use of inexpensive RGB images, image processing, and machine learning (ML) can accurately identify crop stress. Four Machine Learning Image Modules (MLIMs) were developed to enable the rapid and cost-effective identification of sugar beet stresses caused [...] Read more.

Previous researches have proved that the synchronized use of inexpensive RGB images, image processing, and machine learning (ML) can accurately identify crop stress. Four Machine Learning Image Modules (MLIMs) were developed to enable the rapid and cost-effective identification of sugar beet stresses caused by water and/or nitrogen deficiencies. RGB images representing stressed and non-stressed crops were used in the analysis. To improve robustness, data augmentation was applied, generating six variations on each image and expanding the dataset from 150 to 900 images for training and testing. Each MLIM was trained and tested using 54 combinations derived from nine canopy and RGB-based input features and six ML algorithms. The most accurate MLIM used RGB bands as inputs to a Multilayer Perceptron, achieving 96.67% accuracy for overall stress detection, and 95.93% and 94.44% for water and nitrogen stress identification, respectively. A Random Forest model, using only the green band, achieved 92.22% accuracy for stress detection while requiring only one-fourth the computation time. For specific stresses, a Random Forest (RF) model using a Scale-Invariant Feature Transform descriptor (SIFT) achieved 93.33% for water stress, while RF with RGB bands and canopy cover reached 85.56% for nitrogen stress. To address the trade-off between accuracy and computational cost, a bargaining theory-based framework was applied. This approach identified optimal MLIMs that balance performance and execution efficiency. Full article

(This article belongs to the Special Issue Development of Machine Learning and Artificial Intelligence Algorithms in Environmental Retrieval Tasks)

► Show Figures

Figure 1

25 pages, 12760 KB

Open AccessArticle

Intelligent Face Recognition: Comprehensive Feature Extraction Methods for Holistic Face Analysis and Modalities

by Thoalfeqar G. Jarullah, Ahmad Saeed Mohammad, Musab T. S. Al-Kaltakchi and Jabir Alshehabi Al-Ani

Signals 2025, 6(3), 49; https://doi.org/10.3390/signals6030049 - 19 Sep 2025

Cited by 2 | Viewed by 3014

Abstract

Face recognition technology utilizes unique facial features to analyze and compare individuals for identification and verification purposes. This technology is crucial for several reasons, such as improving security and authentication, effectively verifying identities, providing personalized user experiences, and automating various operations, including attendance [...] Read more.

Face recognition technology utilizes unique facial features to analyze and compare individuals for identification and verification purposes. This technology is crucial for several reasons, such as improving security and authentication, effectively verifying identities, providing personalized user experiences, and automating various operations, including attendance monitoring, access management, and law enforcement activities. In this paper, comprehensive evaluations are conducted using different face detection and modality segmentation methods, feature extraction methods, and classifiers to improve system performance. As for face detection, four methods are proposed: OpenCV’s Haar Cascade classifier, Dlib’s HOG + SVM frontal face detector, Dlib’s CNN face detector, and Mediapipe’s face detector. Additionally, two types of feature extraction techniques are proposed: hand-crafted features (traditional methods: global local features) and deep learning features. Three global features were extracted, Scale-Invariant Feature Transform (SIFT), Speeded Robust Features (SURF), and Global Image Structure (GIST). Likewise, the following local feature methods are utilized: Local Binary Pattern (LBP), Weber local descriptor (WLD), and Histogram of Oriented Gradients (HOG). On the other hand, the deep learning-based features fall into two categories: convolutional neural networks (CNNs), including VGG16, VGG19, and VGG-Face, and Siamese neural networks (SNNs), which generate face embeddings. For classification, three methods are employed: Support Vector Machine (SVM), a one-class SVM variant, and Multilayer Perceptron (MLP). The system is evaluated on three datasets: in-house, Labelled Faces in the Wild (LFW), and the Pins dataset (sourced from Pinterest) providing comprehensive benchmark comparisons for facial recognition research. The best performance accuracy for the proposed ten-feature extraction methods applied to the in-house database in the context of the facial recognition task achieved 99.8% accuracy by using the VGG16 model combined with the SVM classifier. Full article

► Show Figures

Figure 1

26 pages, 54898 KB

Open AccessArticle

MSWF: A Multi-Modal Remote Sensing Image Matching Method Based on a Side Window Filter with Global Position, Orientation, and Scale Guidance

by Jiaqing Ye, Guorong Yu and Haizhou Bao

Sensors 2025, 25(14), 4472; https://doi.org/10.3390/s25144472 - 18 Jul 2025

Cited by 1 | Viewed by 1603

Abstract

Multi-modal remote sensing image (MRSI) matching suffers from severe nonlinear radiometric distortions and geometric deformations, and conventional feature-based techniques are generally ineffective. This study proposes a novel and robust MRSI matching method using the side window filter (MSWF). First, a novel side window [...] Read more.

Multi-modal remote sensing image (MRSI) matching suffers from severe nonlinear radiometric distortions and geometric deformations, and conventional feature-based techniques are generally ineffective. This study proposes a novel and robust MRSI matching method using the side window filter (MSWF). First, a novel side window scale space is constructed based on the side window filter (SWF), which can preserve shared image contours and facilitate the extraction of feature points within this newly defined scale space. Second, noise thresholds in phase congruency (PC) computation are adaptively refined with the Weibull distribution; weighted phase features are then exploited to determine the principal orientation of each point, from which a maximum index map (MIM) descriptor is constructed. Third, coarse position, orientation, and scale information obtained through global matching are employed to estimate image-pair geometry, after which descriptors are recalculated for precise correspondence search. MSWF is benchmarked against eight state-of-the-art multi-modal methods—six hand-crafted (PSO-SIFT, LGHD, RIFT, RIFT2, HAPCG, COFSM) and two learning-based (CMM-Net, RedFeat) methods—on three public datasets. Experiments demonstrate that MSWF consistently achieves the highest number of correct matches (NCM) and the highest rate of correct matches (RCM) while delivering the lowest root mean square error (RMSE), confirming its superiority for challenging MRSI registration tasks. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

16 pages, 3953 KB

Open AccessArticle

Skin Lesion Classification Using Hybrid Feature Extraction Based on Classical and Deep Learning Methods

by Maryem Zahid, Mohammed Rziza and Rachid Alaoui

BioMedInformatics 2025, 5(3), 41; https://doi.org/10.3390/biomedinformatics5030041 - 16 Jul 2025

Cited by 1 | Viewed by 2953

Abstract

This paper proposes a hybrid method for skin lesion classification combining deep learning features with conventional descriptors such as HOG, Gabor, SIFT, and LBP. Feature extraction was performed by extracting features of interest within the tumor area using suggested fusion methods. We tested [...] Read more.

This paper proposes a hybrid method for skin lesion classification combining deep learning features with conventional descriptors such as HOG, Gabor, SIFT, and LBP. Feature extraction was performed by extracting features of interest within the tumor area using suggested fusion methods. We tested and compared features obtained from different deep learning models coupled to HOG-based features. Dimensionality reduction and performance improvement were achieved by Principal Component Analysis, after which SVM was used for classification. The compared methods were tested on the reference database skin cancer-malignant-vs-benign. The results show a significant improvement in terms of accuracy due to complementarity between the conventional and deep learning-based methods. Specifically, the addition of HOG descriptors led to an accuracy increase of 5% for EfficientNetB0, 7% for ResNet50, 5% for ResNet101, 1% for NASNetMobile, 1% for DenseNet201, and 1% for MobileNetV2. These findings confirm that feature fusion significantly enhances performance compared to the individual application of each method. Full article

► Show Figures

Figure 1

19 pages, 11574 KB

Open AccessArticle

Multiscale Eight Direction Descriptor-Based Improved SAR–SIFT Method for Along-Track and Cross-Track SAR Images

by Wei Wang, Jinyang Chen and Zhonghua Hong

Appl. Sci. 2025, 15(14), 7721; https://doi.org/10.3390/app15147721 - 10 Jul 2025

Cited by 2 | Viewed by 998

Abstract

Image matching between spaceborne synthetic aperture radar (SAR) images are frequently interfered with by speckle noise, resulting in low matching accuracy, and the vast coverage of SAR images renders the direct matching approach inefficient. To address this issue, the study puts forward a [...] Read more.

Image matching between spaceborne synthetic aperture radar (SAR) images are frequently interfered with by speckle noise, resulting in low matching accuracy, and the vast coverage of SAR images renders the direct matching approach inefficient. To address this issue, the study puts forward a multi-scale adaptive improved SAR image block matching method (called STSU–SAR–SIFT). To improve accuracy, this method addresses the issue of the number of feature points under different thresholds by using the SAR–Shi–Tomasi response function in a multi-scale space. Then, the SUSAN function is used to constrain the effect of coherent noise on the initial feature points, and the multi-scale and multi-directional GLOH descriptor construction approach is used to boost the robustness of descriptors. To improve efficiency, the method adopts the main and additional image overlapping area matching method to reduce the search range and uses multi-core CPU+GPU collaborative parallel computing to boost the efficiency of the SAR–SIFT algorithm by block processing the overlapping area. The experimental results demonstrate that the STSU–SAR–SIFT approach presented in this paper has better accuracy and distribution. After the algorithm acceleration, the efficiency is obviously improved. Full article

(This article belongs to the Section Earth Sciences)

► Show Figures

Figure 1

30 pages, 8644 KB

Open AccessArticle

Development of a UR5 Cobot Vision System with MLP Neural Network for Object Classification and Sorting

by Szymon Kluziak and Piotr Kohut

Information 2025, 16(7), 550; https://doi.org/10.3390/info16070550 - 27 Jun 2025

Cited by 2 | Viewed by 2742

Abstract

This paper presents the implementation of a vision system for a collaborative robot equipped with a web camera and a Python-based control algorithm for automated object-sorting tasks. The vision system aims to detect, classify, and manipulate objects within the robot’s workspace using only [...] Read more.

This paper presents the implementation of a vision system for a collaborative robot equipped with a web camera and a Python-based control algorithm for automated object-sorting tasks. The vision system aims to detect, classify, and manipulate objects within the robot’s workspace using only 2D camera images. The vision system was integrated with the Universal Robots UR5 cobot and designed for object sorting based on shape recognition. The software stack includes OpenCV for image processing, NumPy for numerical operations, and scikit-learn for multilayer perceptron (MLP) models. The paper outlines the calibration process, including lens distortion correction and camera-to-robot calibration in a hand-in-eye configuration to establish the spatial relationship between the camera and the cobot. Object localization relied on a virtual plane aligned with the robot’s workspace. Object classification was conducted using contour similarity with Hu moments, SIFT-based descriptors with FLANN matching, and MLP-based neural models trained on preprocessed images. Conducted performance evaluations encompassed accuracy metrics for used identification methods (MLP classifier, contour similarity, and feature descriptor matching) and the effectiveness of the vision system in controlling the cobot for sorting tasks. The evaluation focused on classification accuracy and sorting effectiveness, using sensitivity, specificity, precision, accuracy, and F1-score metrics. Results showed that neural network-based methods outperformed traditional methods in all categories, concurrently offering more straightforward implementation. Full article

(This article belongs to the Section Information Applications)

► Show Figures

Graphical abstract

15 pages, 1994 KB

Open AccessArticle

A Hybrid Deep Learning and Feature Descriptor Approach for Partial Fingerprint Recognition

by Zhi-Sheng Chen, Chrisantonius, Farchan Hakim Raswa, Shang-Kuan Chen, Chung-I Huang, Kuo-Chen Li, Shih-Lun Chen, Yung-Hui Li and Jia-Ching Wang

Electronics 2025, 14(9), 1807; https://doi.org/10.3390/electronics14091807 - 28 Apr 2025

Cited by 3 | Viewed by 2097

Abstract

Partial fingerprint recognition has emerged as a critical method for verifying user authenticity during mobile transactions. As a result, there is a pressing need to develop techniques that effectively and accurately authenticate users, even when the scanner only captures a limited area of [...] Read more.

Partial fingerprint recognition has emerged as a critical method for verifying user authenticity during mobile transactions. As a result, there is a pressing need to develop techniques that effectively and accurately authenticate users, even when the scanner only captures a limited area of the finger. A key challenge in partial fingerprint matching is the inevitable loss of features when a full fingerprint image is reduced to a partial one. To address this, we propose a method that integrates deep learning with feature descriptors for partial fingerprint matching. Specifically, our approach employs a Siamese Network based on a CNN architecture for deep learning, complemented by a SIFT-based feature descriptor to extract minimal yet significant features from the partial fingerprint. The final matching score is determined by combining the outputs from both methods, using a weighted scheme. The experimental results, obtained from varying image sizes, sufficient epochs, and different datasets, indicate that our combined method achieves an Equal Error Rate (EER) of approximately 4% for databases DB1 and DB3 in the FVC2002 dataset. Additionally, validation at FRR@FAR 1/50,000 yields results of about 6.36% and 8.11% for DB1 and DB2, respectively. These findings demonstrate the efficacy of our approach in partial fingerprint recognition. Future work could involve utilizing higher-resolution datasets to capture more detailed fingerprint features, such as pore structures, and exploring alternative deep learning techniques to further streamline the training process. Full article

► Show Figures

Figure 1

33 pages, 36897 KB

Open AccessArticle

Making Images Speak: Human-Inspired Image Description Generation

by Chifaa Sebbane, Ikram Belhajem and Mohammed Rziza

Information 2025, 16(5), 356; https://doi.org/10.3390/info16050356 - 28 Apr 2025

Cited by 3 | Viewed by 1844

Abstract

Despite significant advances in deep learning-based image captioning, many state-of-the-art models still struggle to balance visual grounding (i.e., accurate object and scene descriptions) with linguistic coherence (i.e., grammatical fluency and appropriate use of non-visual tokens such as articles and prepositions). To address these [...] Read more.

Despite significant advances in deep learning-based image captioning, many state-of-the-art models still struggle to balance visual grounding (i.e., accurate object and scene descriptions) with linguistic coherence (i.e., grammatical fluency and appropriate use of non-visual tokens such as articles and prepositions). To address these limitations, we propose a hybrid image captioning framework that integrates handcrafted and deep visual features. Specifically, we combine local descriptors—Scale-Invariant Feature Transform (SIFT) and Bag of Features (BoF)—with high-level semantic features extracted using ResNet50. This dual representation captures both fine-grained spatial details and contextual semantics. The decoder employs Bahdanau attention refined with an Attention-on-Attention (AoA) mechanism to optimize visual-textual alignment, while GloVe embeddings and a GRU-based sequence model ensure fluent language generation. The proposed system is trained on 200,000 image-caption pairs from the MS COCO train2014 dataset and evaluated on 50,000 held-out MS COCO pairs plus the Flickr8K benchmark. Our model achieves a CIDEr score of 128.3 and a SPICE score of 29.24, reflecting clear improvements over baselines in both semantic precision—particularly for spatial relationships—and grammatical fluency. These results validate that combining classical computer vision techniques with modern attention mechanisms yields more interpretable and linguistically precise captions, addressing key limitations in neural caption generation. Full article

(This article belongs to the Topic Visual Computing and Understanding: New Developments and Trends)

► Show Figures

Figure 1

Search Results (103)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (103)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI