Saved Queries

In the context of global food security and sustainable agricultural development, the efficient recognition and precise management of agricultural insect pests and their predators have become critical challenges in the domain of smart agriculture. To address the limitations of traditional models that overly rely on single-modal inputs and suffer from poor recognition stability under complex field conditions, a multimodal recognition framework has been proposed. This framework integrates RGB imagery, thermal infrared imaging, and environmental sensor data. A cross-modal attention mechanism, environment-guided modality weighting strategy, and decoupled recognition heads are incorporated to enhance the model’s robustness against small targets, intermodal variations, and environmental disturbances. Evaluated on a high-complexity multimodal field dataset, the proposed model significantly outperforms mainstream methods across four key metrics, precision, recall, F1-score, and mAP@50, achieving 91.5% precision, 89.2% recall, 90.3% F1-score, and 88.0% mAP@50. These results represent an improvement of over 6% compared to representative models such as YOLOv8 and DETR. Additional ablation studies confirm the critical contributions of key modules, particularly under challenging scenarios such as low light, strong reflections, and sensor data noise. Moreover, deployment tests conducted on the Jetson Xavier edge device demonstrate the feasibility of real-world application, with the model achieving a 25.7 FPS inference speed and a compact size of 48.3 MB, thus balancing accuracy and lightweight design. This study provides an efficient, intelligent, and scalable AI solution for pest surveillance and biological control, contributing to precision pest management in agricultural ecosystems. Full article

(This article belongs to the Special Issue Artificial Intelligence (AI) and Insect Pests Management: Securing Food Security, Human Health, and Natural Resources)

►▼ Show Figures

Figure 1

22 pages, 5692 KiB

Open AccessArticle

RiceStageSeg: A Multimodal Benchmark Dataset for Semantic Segmentation of Rice Growth Stages

by Jianping Zhang, Tailai Chen, Yizhe Li, Qi Meng, Yanying Chen, Jie Deng and Enhong Sun

Remote Sens. 2025, 17(16), 2858; https://doi.org/10.3390/rs17162858 (registering DOI) - 16 Aug 2025

Abstract

The accurate identification of rice growth stages is critical for precision agriculture, crop management, and yield estimation. Remote sensing technologies, particularly multimodal approaches that integrate high spatial and hyperspectral resolution imagery, have demonstrated great potential in large-scale crop monitoring. Multimodal data fusion offers complementary and enriched spectral–spatial information, providing novel pathways for crop growth stage recognition in complex agricultural scenarios. However, the lack of publicly available multimodal datasets specifically designed for rice growth stage identification remains a significant bottleneck that limits the development and evaluation of relevant methods. To address this gap, we present RiceStageSeg, a multimodal benchmark dataset captured by unmanned aerial vehicles (UAVs), designed to support the development and assessment of segmentation models for rice growth monitoring. RiceStageSeg contains paired centimeter-level RGB and 10-band multispectral (MS) images acquired during several critical rice growth stages, including jointing and heading. Each image is accompanied by fine-grained, pixel-level annotations that distinguish between the different growth stages. We establish baseline experiments using several state-of-the-art semantic segmentation models under both unimodal (RGB-only, MS-only) and multimodal (RGB + MS fusion) settings. The experimental results demonstrate that multimodal feature-level fusion outperforms unimodal approaches in segmentation accuracy. RiceStageSeg offers a standardized benchmark to advance future research in multimodal semantic segmentation for agricultural remote sensing. The dataset will be made publicly available on GitHub v0.11.0 (accessed on 1 August 2025). Full article

(This article belongs to the Special Issue Deep Learning for Multi-Source Remote Sensing Image Interpretation: Exploring, Rethinking, and Limiting Breakthroughs)

►▼ Show Figures

Figure 1

17 pages, 52501 KiB

Open AccessArticle

Single Shot High-Accuracy Diameter at Breast Height Measurement with Smartphone Embedded Sensors

by Wang Xiang, Songlin Fei and Song Zhang

Sensors 2025, 25(16), 5060; https://doi.org/10.3390/s25165060 - 14 Aug 2025

Abstract

Tree diameter at breast height (DBH) is a fundamental metric in forest inventory and management. This paper presents a novel method for DBH estimation using the built-in light detection and ranging (LiDAR) and red, green and blue (RGB) sensors of an iPhone 13 Pro, aiming to improve measurement accuracy and field usability. A single snapshot of a tree, capturing both depth and RGB images, is used to reconstruct a 3D point cloud. The trunk orientation is estimated based on the point cloud to locate the breast height, enabling robust DBH estimation independent of the capture angle. The DBH is initially estimated by the geometrical relationship between trunk size on the image and the depth of the trunk. Finally, a pre-computed lookup table (LUT) is employed to improve the initial DBH estimates into accurate values. Experimental evaluation on 294 trees within a capture range of 0.25 m to 5 m demonstrates a mean absolute error of 0.53 cm and a root mean square error of 0.63 cm. Full article

(This article belongs to the Collection 3D Imaging and Sensing System)

►▼ Show Figures

Figure 1

25 pages, 9564 KiB

Open AccessArticle

Semantic-Aware Cross-Modal Transfer for UAV-LiDAR Individual Tree Segmentation

by Fuyang Zhou, Haiqing He, Ting Chen, Tao Zhang, Minglu Yang, Ye Yuan and Jiahao Liu

Remote Sens. 2025, 17(16), 2805; https://doi.org/10.3390/rs17162805 - 13 Aug 2025

Viewed by 104

Abstract

Cross-modal semantic segmentation of individual tree LiDAR point clouds is critical for accurately characterizing tree attributes, quantifying ecological interactions, and estimating carbon storage. However, in forest environments, this task faces key challenges such as high annotation costs and poor cross-domain generalization. To address these issues, this study proposes a cross-modal semantic transfer framework tailored for individual tree point cloud segmentation in forested scenes. Leveraging co-registered UAV-acquired RGB imagery and LiDAR data, we construct a technical pipeline of “2D semantic inference—3D spatial mapping—cross-modal fusion” to enable annotation-free semantic parsing of 3D individual trees. Specifically, we first introduce a novel Multi-Source Feature Fusion Network (MSFFNet) to achieve accurate instance-level segmentation of individual trees in the 2D image domain. Subsequently, we develop a hierarchical two-stage registration strategy to effectively align dense matched point clouds (MPC) generated from UAV imagery with LiDAR point clouds. On this basis, we propose a probabilistic cross-modal semantic transfer model that builds a semantic probability field through multi-view projection and the expectation–maximization algorithm. By integrating geometric features and semantic confidence, the model establishes semantic correspondences between 2D pixels and 3D points, thereby achieving spatially consistent semantic label mapping. This facilitates the transfer of semantic annotations from the 2D image domain to the 3D point cloud domain. The proposed method is evaluated on two forest datasets. The results demonstrate that the proposed individual tree instance segmentation approach achieves the highest performance, with an IoU of 87.60%, compared to state-of-the-art methods such as Mask R-CNN, SOLOV2, and Mask2Former. Furthermore, the cross-modal semantic label transfer framework significantly outperforms existing mainstream methods in individual tree point cloud semantic segmentation across complex forest scenarios. Full article

(This article belongs to the Topic Vegetation Characterization and Classification With Multi-Source Remote Sensing Data)

►▼ Show Figures

Figure 1

27 pages, 15885 KiB

Open AccessArticle

Model-Free UAV Navigation in Unknown Complex Environments Using Vision-Based Reinforcement Learning

by Hao Wu, Wei Wang, Tong Wang and Satoshi Suzuki

Drones 2025, 9(8), 566; https://doi.org/10.3390/drones9080566 - 12 Aug 2025

Viewed by 354

Abstract

Autonomous UAV navigation in unknown and complex environments remains a core challenge, especially under limited sensing and computing resources. While most methods rely on modular pipelines involving mapping, planning, and control, they often suffer from poor real-time performance, limited adaptability, and high dependency on accurate environment models. Moreover, many deep-learning-based solutions either use RGB images prone to visual noise or optimize only a single objective. In contrast, this paper proposes a unified, model-free vision-based DRL framework that directly maps onboard depth images and UAV state information to continuous navigation commands through a single convolutional policy network. This end-to-end architecture eliminates the need for explicit mapping and modular coordination, significantly improving responsiveness and robustness. A novel multi-objective reward function is designed to jointly optimize path efficiency, safety, and energy consumption, enabling adaptive flight behavior in unknown complex environments. The trained policy demonstrates generalization in diverse simulated scenarios and transfers effectively to real-world UAV flights. Experiments show that our approach achieves stable navigation and low latency. Full article

►▼ Show Figures

Figure 1

17 pages, 5705 KiB

Open AccessArticle

Cherry Tomato Bunch and Picking Point Detection for Robotic Harvesting Using an RGB-D Sensor and a StarBL-YOLO Network

by Pengyu Li, Ming Wen, Zhi Zeng and Yibin Tian

Horticulturae 2025, 11(8), 949; https://doi.org/10.3390/horticulturae11080949 - 11 Aug 2025

Viewed by 304

Abstract

For fruit harvesting robots, rapid and accurate detection of fruits and picking points is one of the main challenges for their practical deployment. Several fruits typically grow in clusters or bunches, such as grapes, cherry tomatoes, and blueberries. For such clustered fruits, it is desired for them to be picked by bunches instead of individually. This study proposes utilizing a low-cost off-the-shelf RGB-D sensor mounted on the end effector and a lightweight improved YOLOv8-Pose neural network to detect cherry tomato bunches and picking points for robotic harvesting. The problem of occlusion and overlap is alleviated by merging RGB and depth images from the RGB-D sensor. To enhance detection robustness in complex backgrounds and reduce the complexity of the model, the Starblock module from StarNet and the coordinate attention mechanism are incorporated into the YOLOv8-Pose network, termed StarBL-YOLO, to improve the efficiency of feature extraction and reinforce spatial information. Additionally, we replaced the original OKS loss function with the L1 loss function for keypoint loss calculation, which improves the accuracy in picking points localization. The proposed method has been evaluated on a dataset with 843 cherry tomato RGB-D image pairs acquired by a harvesting robot at a commercial greenhouse farm. Experimental results demonstrate that the proposed StarBL-YOLO model achieves a 12% reduction in model parameters compared to the original YOLOv8-Pose while improving detection accuracy for cherry tomato bunches and picking points. Specifically, the model shows significant improvements across all metrics: for computational efficiency, model size (−11.60%) and GFLOPs (−7.23%); for pickable bunch detection, mAP50 (+4.4%) and mAP50-95 (+4.7%); for non-pickable bunch detection, mAP50 (+8.0%) and mAP50-95 (+6.2%); and for picking point detection, mAP50 (+4.3%), mAP50-95 (+4.6%), and RMSE (−23.98%). These results validate that StarBL-YOLO substantially enhances detection accuracy for cherry tomato bunches and picking points while improving computational efficiency, which is valuable for resource-constrained edge-computing deployment for harvesting robots. Full article

(This article belongs to the Special Issue Advanced Automation for Tree Fruit Orchards and Vineyards)

►▼ Show Figures

Figure 1

31 pages, 47425 KiB

Open AccessArticle

T360Fusion: Temporal 360 Multimodal Fusion for 3D Object Detection via Transformers

by Khanh Bao Tran, Alexander Carballo and Kazuya Takeda

Sensors 2025, 25(16), 4902; https://doi.org/10.3390/s25164902 - 8 Aug 2025

Viewed by 215

Abstract

Object detection plays a significant role in various industrial and scientific domains, particularly in autonomous driving. It enables vehicles to detect surrounding objects, construct spatial maps, and facilitate safe navigation. To accomplish these tasks, a variety of sensors have been employed, including LiDAR, radar, RGB cameras, and ultrasonic sensors. Among these, LiDAR and RGB cameras are frequently utilized due to their advantages. RGB cameras offer high-resolution images with rich color and texture information but tend to underperform in low light or adverse weather conditions. In contrast, LiDAR provides precise 3D geometric data irrespective of lighting conditions, although it lacks the high spatial resolution of cameras. Recently, thermal cameras have gained significant attention in both standalone applications and in combination with RGB cameras. They offer strong perception capabilities under low-visibility conditions or adverse weather conditions. Multimodal sensor fusion effectively overcomes individual sensor limitations. In this paper, we propose a novel multimodal fusion method that integrates LiDAR, a 360 RGB camera, and a 360 thermal camera to fully leverage the strengths of each modality. Our method employs a feature-level fusion strategy that temporally accumulates and synchronizes multiple LiDAR frames. This design not only improves the detection accuracy but also enhances the spatial coverage and robustness. The use of 360 images significantly reduces blind spots and provides comprehensive environmental awareness, which is especially beneficial in complex or dynamic scenes. Full article

(This article belongs to the Section Sensing and Imaging)

►▼ Show Figures

Figure 1

25 pages, 6468 KiB

Open AccessArticle

Thermal Imaging-Based Lightweight Gesture Recognition System for Mobile Robots

by Xinxin Wang, Xiaokai Ma, Hongfei Gao, Lijun Wang and Xiaona Song

Machines 2025, 13(8), 701; https://doi.org/10.3390/machines13080701 - 8 Aug 2025

Viewed by 183

Abstract

With the rapid advancement of computer vision and deep learning technologies, the accuracy and efficiency of real-time gesture recognition have significantly improved. This paper introduces a gesture-controlled robot system based on thermal imaging sensors. By replacing traditional physical button controls, this design significantly enhances the interactivity and operational convenience of human–machine interaction. First, a thermal imaging gesture dataset is collected using Python3.9. Compared to traditional RGB images, thermal imaging can better capture gesture details, especially in low-light conditions, thereby improving the robustness of gesture recognition. Subsequently, a neural network model is constructed and trained using Keras, and the model is then deployed to a microcontroller. This lightweight model design enables the gesture recognition system to operate on resource-constrained embedded devices, achieving real-time performance and high efficiency. In addition, using a standalone thermal sensor for gesture recognition avoids the complexity of multi-sensor fusion schemes, simplifies the system structure, reduces costs, and ensures real-time performance and stability. The final results demonstrate that the proposed design achieves a model test accuracy of 99.05%. In summary, through its gesture recognition capabilities—featuring high accuracy, low latency, non-contact interaction, and low-light adaptability—this design precisely meets the core demands for “convenient, safe, and natural interaction” in rehabilitation, smart homes, and elderly assistive devices, showcasing clear potential for practical scenario implementation. Full article

(This article belongs to the Section Robotics, Mechatronics and Intelligent Machines)

►▼ Show Figures

Figure 1

19 pages, 9147 KiB

Open AccessArticle

Evaluating Forest Canopy Structures and Leaf Area Index Using a Five-Band Depth Image Sensor

by Geilebagan, Takafumi Tanaka, Takashi Gomi, Ayumi Kotani, Genya Nakaoki, Xinwei Wang and Shodai Inokoshi

Forests 2025, 16(8), 1294; https://doi.org/10.3390/f16081294 - 8 Aug 2025

Viewed by 234

Abstract

The objective of the study was to develop and validate a ground-based method using a depth image sensor equipped with depth, visible red, green, blue (RGB), and near-infrared bands to measure the leaf area index (LAI) based on the relative illuminance of foliage only. The method was applied in a Itajii chinkapin (Castanopsis sieboldii (Makino) Hatus. ex T.Yamaz. & Mashiba )forest in Aichi Prefecture, Japan, and validated by comparing estimates with conventional methods (LAI-2200 and fisheye photography). To apply the 5-band sensor to actual forests, a methodology is proposed for matching the color camera and near-infrared camera in units of pixels, along with a method for widening the exposure range through multi-step camera exposure. Based on these advancements, the RGB color band, near-infrared band, and depth band are converted into several physical properties. Employing these properties, each pixel of the canopy image is classified into upper foliage, lower foliage, sky, and non-assimilated parts (stems and branches). Subsequently, the LAI is calculated using the gap-fraction method, which is based on the relative illuminance of the foliage. In comparison with existing indirect LAI estimations, this technique enabled the distinction between upper and lower canopy layers and the exclusion of non-assimilated parts. The findings indicate that the plant area index (PAI) ranged from 2.23 to 3.68 m² m⁻², representing an increase from 33% to 34% compared to the LAI calculated after excluding non-assimilating parts. The findings of this study underscore the necessity of distinguishing non-assimilated components in the estimation of LAI. The PAI estimates derived from the depth image sensor exhibited moderate to strong agreement with the LAI-2200, contingent upon canopy rings (R² = 0.48–0.98), thereby substantiating the reliability of the system’s performance. The developed approaches also permit the evaluation of the distributions of leaves and branches at various heights from the ground surface to the top of the canopy. The novel LAI measurement method developed in this study has the potential to provide precise, reliable foundational data to support research in ecology and hydrology related to complex tree structures. Full article

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

►▼ Show Figures

Figure 1

21 pages, 2428 KiB

Open AccessArticle

Robust Human Pose Estimation Method for Body-to-Body Occlusion Using RGB-D Fusion Neural Network

by Jae-hyuk Yoon and Soon-kak Kwon

Appl. Sci. 2025, 15(15), 8746; https://doi.org/10.3390/app15158746 - 7 Aug 2025

Viewed by 247

Abstract

In this study, we propose a novel approach for human pose estimation (HPE) in occluded scenes by progressively fusing features extracted from RGB-D images, which contain RGB and depth images. Conventional bottom-up human pose estimation models that rely solely on RGB inputs often produce erroneous skeletons when parts of a person’s body are obscured by another individual, because they struggle to accurately infer body connectivity due to the lack of 3D topological information. To address this limitation, we modify the traditional OpenPose that is a bottom-up HPE model to take a depth image as an additional input, thereby providing explicit 3D spatial cues. Each input modality is processed by a dedicated feature extractor. Each input modality is processed by a dedicated feature extractor. In addition to the two existing modules for each stage—joint connectivity and joint confidence map estimations for the color image—we integrate a new module for estimating joint confidence maps for the depth image into the initial few stages. Subsequently, the confidence maps derived from both depth and RGB modalities are fused at each stage and forwarded to the next, ensuring that 3D topological information from the depth image is effectively utilized for both joint localization and body part association. Subsequently, the confidence maps derived from both depth and RGB modalities are fused at each stage and forwarded to the next to ensure that 3D topological information is effectively utilized for estimating both joint localization and their connectivity. The experimental results on the NTU 120+ RGB-D Dataset verify that our proposed approach achieves a 13.3% improvement in average recall compared to the original OpenPose model. The proposed method can enhance the performance of the bottom-up HPE models for the occlusion scenes. Full article

(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)

►▼ Show Figures

Figure 1

26 pages, 10480 KiB

Open AccessArticle

Monitoring Chlorophyll Content of Brassica napus L. Based on UAV Multispectral and RGB Feature Fusion

by Yongqi Sun, Jiali Ma, Mengting Lyu, Jianxun Shen, Jianping Ying, Skhawat Ali, Basharat Ali, Wenqiang Lan, Yiwa Hu, Fei Liu, Weijun Zhou and Wenjian Song

Agronomy 2025, 15(8), 1900; https://doi.org/10.3390/agronomy15081900 - 7 Aug 2025

Viewed by 300

Abstract

Accurate prediction of chlorophyll content in Brassica napus L. (rapeseed) is essential for monitoring plant nutritional status and precision agricultural management. The current study focuses on single cultivars, limiting general applicability. This study used unmanned aerial vehicle (UAV)-based RGB and multispectral imagery to evaluate six rapeseed cultivars chlorophyll content across mixed-growth stages, including seedling, bolting, and initial flowering stages. The ExG-ExR threshold segmentation was applied to remove background interference. Subsequently, color and spectral indices were extracted from segmented images and ranked according to their correlations with measured chlorophyll content. Partial Least Squares Regression (PLSR), Multiple Linear Regression (MLR), and Support Vector Regression (SVR) models were independently established using subsets of the top-ranked features. Model performance was assessed by comparing prediction accuracy (R² and RMSE). Results demonstrated significant accuracy improvements following background removal, especially for the SVR model. Compared to data without background removal, accuracy increased notably with background removal by 8.0% (R²p improved from 0.683 to 0.763) for color indices and 3.1% (R²p from 0.835 to 0.866) for spectral indices. Additionally, stepwise fusion of spectral and color indices further improved prediction accuracy. Optimal results were obtained by fusing the top seven color features ranked by correlation with chlorophyll content, achieving an R²p of 0.878 and an RMSE of 52.187 μg/g. These findings highlight the effectiveness of background removal and feature fusion in enhancing chlorophyll prediction accuracy. Full article

(This article belongs to the Special Issue Facility Agriculture Robots and Autonomous Unmanned Management for Crops)

►▼ Show Figures

Figure 1

20 pages, 3925 KiB

Open AccessArticle

Multi-Scale Pure Graphs with Multi-View Subspace Clustering for Salient Object Detection

by Mingxian Wang, Hongwei Yang, Yi Zhang, Wenjie Wang and Fan Wang

Symmetry 2025, 17(8), 1262; https://doi.org/10.3390/sym17081262 - 7 Aug 2025

Viewed by 222

Abstract

Salient object detection is a challenging task in the field of computer vision. The graph-based model has attracted lots of research attention and achieved remarkable progress in this task, which constructs graphs to formulate the intrinsic structure of any image. Nevertheless, the existing graph-based salient object detection methods still have certain limitations and face two major challenges: (1) Previous graphs are constructed by the Gaussian kernel, but they are often corrupted by original noise. (2) They fail to capture common representations and complementary diversity of multi-view features. Both of these degrade saliency performance. In this paper, we propose a novel method, called multi-scale pure graphs with multi-view subspace clustering for salient object detection. Its main contribution is a new, two-stage graph, constructed and constrained by multi-view subspace clustering with sparsity and low rank. One of the advantages is that the multi-scale pure graphs upgrade the saliency performance from the propagation of noise in the graph matrix. Another advantage is that the multi-scale pure graphs exploit consistency and complementary information among multi-view features, which can effectively boost the capability of the graphs. In addition, to verify the impact of the symmetry of the multi-scale pure graphs on the salient object detection performance, we compared the proposed two-stage graphs, which included cases considering the multi-scale pure graphs and those not considering the multi-scale pure graphs. The experimental results were derived using several RGB benchmark datasets and several state-of-the-art algorithms for comparison. The results demonstrate that the proposed method outperforms the state-of-the-art approaches in terms of multiple standard evaluation metrics. This paper reveals that multi-view subspace clustering is beneficial in promoting graph-based saliency detection tasks. Full article

(This article belongs to the Section Engineering and Materials)

►▼ Show Figures

Figure 1

17 pages, 2649 KiB

Open AccessArticle

Four-Dimensional Hyperspectral Imaging for Fruit and Vegetable Grading

by Laraib Haider Naqvi, Badrinath Balasubramaniam, Jiaqiong Li, Lingling Liu and Beiwen Li

Agriculture 2025, 15(15), 1702; https://doi.org/10.3390/agriculture15151702 - 6 Aug 2025

Viewed by 316

Abstract

Reliable, non-destructive grading of fresh fruit requires simultaneous assessment of external morphology and hidden internal defects. Camera-based grading of fresh fruit using colorimetric (RGB) and near-infrared (NIR) imaging often misses subsurface bruising and cannot capture the fruit’s true shape, leading to inconsistent quality assessment and increased waste. To address this, we developed a 4D-grading pipeline that fuses visible and near-infrared (VNIR) and short-wave infrared (SWIR) hyperspectral imaging with structured-light 3D scanning to non-destructively evaluate both internal defects and external form. Our contributions are (1) flagging the defects in fruits based on the reflectance information, (2) accurate shape and defect measurement based on the 3D data of fruits, and (3) an interpretable, decision-tree framework that assigns USDA-style quality (Premium, Grade 1/2, Reject) and size (Small–Extra Large) labels. We demonstrate this approach through preliminary results, suggesting that 4D hyperspectral imaging may offer advantages over single-modality methods by providing clear, interpretable decision rules and the potential for adaptation to other produce types. Full article

(This article belongs to the Special Issue Optics and Image Analysis in Modern Agriculture: Transforming Practices and Unveiling Opportunities)

►▼ Show Figures

Figure 1

31 pages, 34013 KiB

Open AccessArticle

Vision-Based 6D Pose Analytics Solution for High-Precision Industrial Robot Pick-and-Place Applications

by Balamurugan Balasubramanian and Kamil Cetin

Sensors 2025, 25(15), 4824; https://doi.org/10.3390/s25154824 - 6 Aug 2025

Viewed by 453

Abstract

High-precision 6D pose estimation for pick-and-place operations remains a critical problem for industrial robot arms in manufacturing. This study introduces an analytics-based solution for 6D pose estimation designed for a real-world industrial application: it enables the Staubli TX2-60L (manufactured by Stäubli International AG, Horgen, Switzerland) robot arm to pick up metal plates from various locations and place them into a precisely defined slot on a brake pad production line. The system uses a fixed eye-to-hand Intel RealSense D435 RGB-D camera (manufactured by Intel Corporation, Santa Clara, California, USA) to capture color and depth data. A robust software infrastructure developed in LabVIEW (ver.2019) integrated with the NI Vision (ver.2019) library processes the images through a series of steps, including particle filtering, equalization, and pattern matching, to determine the X-Y positions and Z-axis rotation of the object. The Z-position of the object is calculated from the camera’s intensity data, while the remaining X-Y rotation angles are determined using the angle-of-inclination analytics method. It is experimentally verified that the proposed analytical solution outperforms the hybrid-based method (YOLO-v8 combined with PnP/RANSAC algorithms). Experimental results across four distinct picking scenarios demonstrate the proposed solution’s superior accuracy, with position errors under 2 mm, orientation errors below 1°, and a perfect success rate in pick-and-place tasks. Full article

(This article belongs to the Section Sensors and Robotics)

►▼ Show Figures

Figure 1

17 pages, 1306 KiB

Open AccessArticle

Rapid Salmonella Serovar Classification Using AI-Enabled Hyperspectral Microscopy with Enhanced Data Preprocessing and Multimodal Fusion

by MeiLi Papa, Siddhartha Bhattacharya, Bosoon Park and Jiyoon Yi

Foods 2025, 14(15), 2737; https://doi.org/10.3390/foods14152737 - 5 Aug 2025

Viewed by 279

Abstract

Salmonella serovar identification typically requires multiple enrichment steps using selective media, consuming considerable time and resources. This study presents a rapid, culture-independent method leveraging artificial intelligence (AI) to classify Salmonella serovars from rich hyperspectral microscopy data. Five serovars (Enteritidis, Infantis, Kentucky, Johannesburg, 4,[5],12:i:-) were analyzed from samples prepared using only sterilized de-ionized water. Hyperspectral data cubes were collected to generate single-cell spectra and RGB composite images representing the full microscopy field. Data analysis involved two parallel branches followed by multimodal fusion. The spectral branch compared manual feature selection with data-driven feature extraction via principal component analysis (PCA), followed by classification using conventional machine learning models (i.e., k-nearest neighbors, support vector machine, random forest, and multilayer perceptron). The image branch employed a convolutional neural network (CNN) to extract spatial features directly from images without predefined morphological descriptors. Using PCA-derived spectral features, the highest performing machine learning model achieved 81.1% accuracy, outperforming manual feature selection. CNN-based classification using image features alone yielded lower accuracy (57.3%) in this serovar-level discrimination. In contrast, a multimodal fusion model combining spectral and image features improved accuracy to 82.4% on the unseen test set while reducing overfitting on the train set. This study demonstrates that AI-enabled hyperspectral microscopy with multimodal fusion can streamline Salmonella serovar identification workflows. Full article

(This article belongs to the Special Issue Artificial Intelligence (AI) and Machine Learning for Foods)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 57.

Go to page 1 2 3 4 5

Search Results (2,836)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI