Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (146)

Search Parameters:
Keywords = Deep Object Pose Estimation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 3995 KB  
Article
Neural Vessel Segmentation and Gaussian Splatting for 3D Reconstruction of Cerebral Angiography
by Oleh Kryvoshei, Patrik Kamencay and Ladislav Polak
AI 2026, 7(1), 22; https://doi.org/10.3390/ai7010022 - 10 Jan 2026
Viewed by 138
Abstract
Cerebrovascular diseases are a leading cause of global mortality, underscoring the need for objective and quantitative 3D visualization of cerebral vasculature from dynamic imaging modalities. Conventional analysis is often labor-intensive, subjective, and prone to errors due to image noise and subtraction artifacts. This [...] Read more.
Cerebrovascular diseases are a leading cause of global mortality, underscoring the need for objective and quantitative 3D visualization of cerebral vasculature from dynamic imaging modalities. Conventional analysis is often labor-intensive, subjective, and prone to errors due to image noise and subtraction artifacts. This study tackles the challenge of achieving fast and accurate volumetric reconstruction from angiography sequences. We propose a multi-stage pipeline that begins with image restoration to enhance input quality, followed by neural segmentation to extract vascular structures. Camera poses and sparse geometry are estimated through Structure-from-Motion, and these reconstructions are refined by leveraging the segmentation maps to isolate vessel-specific features. The resulting data are then used to initialize and optimize a 3D Gaussian Splatting model, enabling anatomically precise representation of cerebral vasculature. The integration of deep neural segmentation priors with explicit geometric initialization yields highly detailed 3D reconstructions of cerebral angiography. The resulting models leverage the computational efficiency of 3D Gaussian Splatting, achieving near-real-time rendering performance competitive with state-of-the-art reconstruction methods. The segmentation of brain vessels using nnU-Net and our trained model achieved an accuracy of 84.21%, highlighting the improvement in the performance of the proposed approach. Overall, our pipeline significantly improves both the efficiency and accuracy of volumetric cerebral vasculature reconstruction, providing a robust foundation for quantitative clinical analysis and enhanced guidance during endovascular procedures. Full article
Show Figures

Figure 1

34 pages, 1365 KB  
Review
Predicting Physical Appearance from Low Template: State of the Art and Future Perspectives
by Francesco Sessa, Emina Dervišević, Massimiliano Esposito, Martina Francaviglia, Mario Chisari, Cristoforo Pomara and Monica Salerno
Genes 2026, 17(1), 59; https://doi.org/10.3390/genes17010059 - 5 Jan 2026
Viewed by 346
Abstract
Background/Objectives: Forensic DNA phenotyping (FDP) enables the prediction of externally visible characteristics (EVCs) such as eye, hair, and skin color, ancestry, and age from biological traces. However, low template DNA (LT-DNA), often derived from degraded or trace samples, poses significant challenges due [...] Read more.
Background/Objectives: Forensic DNA phenotyping (FDP) enables the prediction of externally visible characteristics (EVCs) such as eye, hair, and skin color, ancestry, and age from biological traces. However, low template DNA (LT-DNA), often derived from degraded or trace samples, poses significant challenges due to allelic dropout, contamination, and incomplete profiles. This review evaluates recent advances in FDP from LT-DNA, focusing on the integration of machine learning (ML) models to improve predictive accuracy and operational readiness, while addressing ethical and population-related considerations. Methods: A comprehensive literature review was conducted on FDP and ML applications in forensic genomics. Key areas examined include SNP-based trait modeling, genotype imputation, epigenetic age estimation, and probabilistic inference. Comparative performance of ML algorithms (Random Forests, Support Vector Machines, Gradient Boosting, and deep learning) was assessed using datasets such as the 1000 Genomes Project, UK Biobank, and forensic casework samples. Ethical frameworks and validation standards were also analyzed. Results: ML approaches significantly enhance phenotype prediction from LT-DNA, achieving AUC > 0.9 for eye color and improving SNP recovery by up to 15% through imputation. Tools like HIrisPlex-S and VISAGE panels remain robust for eye and hair color, with moderate accuracy for skin tone and emerging capabilities for age and facial morphology. Limitations persist in admixed populations and traits with polygenic complexity. Interpretability and bias mitigation remain critical for forensic admissibility. Conclusions: L integration strengthens FDP from LT-DNA, offering valuable investigative leads in challenging scenarios. Future directions include multi-omics integration, portable sequencing platforms, inclusive reference datasets, and explainable AI to ensure accuracy, transparency, and ethical compliance in forensic applications. Full article
(This article belongs to the Special Issue Advanced Research in Forensic Genetics)
Show Figures

Figure 1

19 pages, 38545 KB  
Article
Improving Dynamic Visual SLAM in Robotic Environments via Angle-Based Optical Flow Analysis
by Sedat Dikici and Fikret Arı
Electronics 2026, 15(1), 223; https://doi.org/10.3390/electronics15010223 - 3 Jan 2026
Viewed by 254
Abstract
Dynamic objects present a major challenge for visual simultaneous localization and mapping (Visual SLAM), as feature measurements originating from moving regions can corrupt camera pose estimation and lead to inaccurate maps. In this paper, we propose a lightweight, semantic-free front-end enhancement for ORB-SLAM [...] Read more.
Dynamic objects present a major challenge for visual simultaneous localization and mapping (Visual SLAM), as feature measurements originating from moving regions can corrupt camera pose estimation and lead to inaccurate maps. In this paper, we propose a lightweight, semantic-free front-end enhancement for ORB-SLAM that detects and suppresses dynamic features using optical flow geometry. The key idea is to estimate a global motion direction point (MDP) from optical flow vectors and to classify feature points based on their angular consistency with the camera-induced motion field. Unlike magnitude-based flow filtering, the proposed strategy exploits the geometric consistency of optical flow with respect to a motion direction point, providing robustness not only to depth variation and camera speed changes but also to different camera motion patterns, including pure translation and pure rotation. The method is integrated into the ORB-SLAM front-end without modifying the back-end optimization or cost function. Experiments on public dynamic-scene datasets demonstrate that the proposed approach reduces absolute trajectory error by up to approximately 45% compared to baseline ORB-SLAM, while maintaining real-time performance on a CPU-only platform. These results indicate that reliable dynamic feature suppression can be achieved without semantic priors or deep learning models. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

19 pages, 2564 KB  
Article
Dynamic Feature Elimination-Based Visual–Inertial Navigation Algorithm
by Jiawei Yu, Hongde Dai, Juan Li, Xin Li and Xueying Liu
Sensors 2026, 26(1), 52; https://doi.org/10.3390/s26010052 - 20 Dec 2025
Viewed by 454
Abstract
To address the problem of degraded positioning accuracy in traditional visual–inertial navigation systems (VINS) due to interference from moving objects in dynamic scenarios, this paper proposes an improved algorithm based on the VINS-Fusion framework, which resolves this issue through a synergistic combination of [...] Read more.
To address the problem of degraded positioning accuracy in traditional visual–inertial navigation systems (VINS) due to interference from moving objects in dynamic scenarios, this paper proposes an improved algorithm based on the VINS-Fusion framework, which resolves this issue through a synergistic combination of multi-scale feature optimization and real-time dynamic feature elimination. First, at the feature extraction front-end, the SuperPoint encoder structure is reconstructed. By integrating dual-branch multi-scale feature fusion and 1 × 1 convolutional channel compression, it simultaneously captures shallow texture details and deep semantic information, enhances the discriminative ability of static background features, and reduces mis-elimination near dynamic–static boundaries. Second, in the dynamic processing module, the ASORT (Adaptive Simple Online and Realtime Tracking) algorithm is designed. This algorithm combines an object detection network, adaptive Kalman filter-based trajectory prediction, and a Hungarian algorithm-based matching mechanism to identify moving objects in images in real time, filter out their associated dynamic feature points from the optimized feature point set, and ensure that only reliable static features are input to the backend optimization, thereby minimizing pose estimation errors caused by dynamic interference. Experiments on the KITTI dataset demonstrate that, compared with the original VINS-Fusion algorithm, the proposed method achieves an average improvement of approximately 14.8% in absolute trajectory accuracy, with an average single-frame processing time of 23.9 milliseconds. This validates that the proposed approach provides an efficient and robust solution for visual–inertial navigation in highly dynamic environments. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

23 pages, 21889 KB  
Article
Multi-Stage Domain-Adapted 6D Pose Estimation of Warehouse Load Carriers: A Deep Convolutional Neural Network Approach
by Hisham ElMoaqet, Mohammad Rashed and Mohamed Bakr
Machines 2025, 13(12), 1126; https://doi.org/10.3390/machines13121126 - 8 Dec 2025
Viewed by 454
Abstract
Intelligent autonomous guided vehicles (AGVs) are of huge importance in facilitating the automation of load handling in the era of Industry 4.0. AGVs heavily rely on environmental perception, such as the 6D poses of objects, in order to execute complex tasks efficiently. Therefore, [...] Read more.
Intelligent autonomous guided vehicles (AGVs) are of huge importance in facilitating the automation of load handling in the era of Industry 4.0. AGVs heavily rely on environmental perception, such as the 6D poses of objects, in order to execute complex tasks efficiently. Therefore, estimating the 6D poses of objects in warehouses is crucial for proper load handling in modern intra-logistics warehouse environments. This study presents a deep convolutional neural network approach for estimating the pose of warehouse load carriers. Recognizing the paucity of labeled real 6D pose estimation data, the proposed approach uses only synthetic RGB warehouse data to train the network. Domain adaption was applied using a Contrastive Unpaired Image-to-Image Translation (CUT) Network to generate domain-adapted training data that can bridge the domain gap between synthetic and real environments and help the model generalize better over realistic scenes. In order to increase the detection range, a multi-stage refinement detection pipeline is developed using consistent multi-view multi-object 6D pose estimation (CosyPose) networks. The proposed framework was tested with different training scenarios, and its performance was comprehensively analyzed and compared with a state-of-the-art non-adapted single-stage pose estimation approach, showing an improvement of up to 80% on the ADD-S AUC metric. Using a mix of adapted and non-adapted synthetic data along with splitting the state space into multiple refiners, the proposed approach achieved an ADD-S AUC performance greater than 0.81 over a wide detection range, from one and up to five meters, while still being trained on a relatively small synthetic dataset for a limited number of epochs. Full article
(This article belongs to the Special Issue Industry 4.0: Intelligent Robots in Smart Manufacturing)
Show Figures

Figure 1

16 pages, 8229 KB  
Article
MVL-Loc: Leveraging Vision-Language Model for Generalizable Multi-Scene Camera Relocalization
by Zhendong Xiao, Shan Yang, Shujie Ji, Jun Yin, Ziling Wen and Wu Wei
Appl. Sci. 2025, 15(23), 12642; https://doi.org/10.3390/app152312642 - 28 Nov 2025
Viewed by 413
Abstract
Camera relocalization, a cornerstone capability of modern computer vision, accurately determines a camera’s position and orientation from images and is essential for applications in augmented reality, mixed reality, autonomous driving, delivery drones, and robotic navigation. Unlike traditional deep learning-based methods regress camera pose [...] Read more.
Camera relocalization, a cornerstone capability of modern computer vision, accurately determines a camera’s position and orientation from images and is essential for applications in augmented reality, mixed reality, autonomous driving, delivery drones, and robotic navigation. Unlike traditional deep learning-based methods regress camera pose from images in a single scene which lack generalization and robustness in diverse environments. We propose MVL-Loc, a novel end-to-end multi-scene six degrees of freedom camera relocalization framework. MVL-Loc leverages pretrained world knowledge from vision-language models and incorporates multimodal data to generalize across both indoor and outdoor settings. Furthermore, natural language is employed as a directive tool to guide the multi-scene learning process, facilitating semantic understanding of complex scenes and capturing spatial relationships among objects. Extensive experiments on the 7Scenes and Cambridge Landmarks datasets demonstrate MVL-Loc’s robustness and state-of-the-art performance in real-world multi-scene camera relocalization, with improved accuracy in both positional and orientational estimates. Full article
Show Figures

Figure 1

17 pages, 3038 KB  
Article
Research on Deep Learning-Based Human–Robot Static/Dynamic Gesture-Driven Control Framework
by Gong Zhang, Jiahong Su, Shuzhong Zhang, Jianzheng Qi, Zhicheng Hou and Qunxu Lin
Sensors 2025, 25(23), 7203; https://doi.org/10.3390/s25237203 - 25 Nov 2025
Cited by 1 | Viewed by 725
Abstract
For human–robot gesture-driven control, this paper proposes a deep learning-based approach that employs both static and dynamic gestures to drive and control robots for object-grasping and delivery tasks. The method utilizes two-dimensional Convolutional Neural Networks (2D-CNNs) for static gesture recognition and a hybrid [...] Read more.
For human–robot gesture-driven control, this paper proposes a deep learning-based approach that employs both static and dynamic gestures to drive and control robots for object-grasping and delivery tasks. The method utilizes two-dimensional Convolutional Neural Networks (2D-CNNs) for static gesture recognition and a hybrid architecture combining three-dimensional Convolutional Neural Networks (3D-CNNs) and Long Short-Term Memory networks (3D-CNN+LSTM) for dynamic gesture recognition. Results on a custom gesture dataset demonstrate validation accuracies of 95.38% for static gestures and 93.18% for dynamic gestures, respectively. Then, in order to control and drive the robot to perform corresponding tasks, hand pose estimation was performed. The MediaPipe machine learning framework was first employed to extract hand feature points. These 2D feature points were then converted into 3D coordinates using a depth camera-based pose estimation method, followed by coordinate system transformation to obtain hand poses relative to the robot’s base coordinate system. Finally, an experimental platform for human–robot gesture-driven interaction was established, deploying both gesture recognition models. Four participants were invited to perform 100 trials each of gesture-driven object-grasping and delivery tasks under three lighting conditions: natural light, low light, and strong light. Experimental results show that the average success rates for completing tasks via static and dynamic gestures are no less than 96.88% and 94.63%, respectively, with task completion times consistently within 20 s. These findings demonstrate that the proposed approach enables robust vision-based robotic control through natural hand gestures, showing great prospects for human–robot collaboration applications. Full article
Show Figures

Figure 1

27 pages, 4100 KB  
Article
Analysis of Deep-Learning Methods in an ISO/TS 15066–Compliant Human–Robot Safety Framework
by David Bricher and Andreas Müller
Sensors 2025, 25(23), 7136; https://doi.org/10.3390/s25237136 - 22 Nov 2025
Viewed by 863
Abstract
Over the last years collaborative robots have gained great success in manufacturing applications where human and robot work together in close proximity. However, current ISO/TS-15066-compliant implementations often limit the efficiency of collaborative tasks due to conservative speed restrictions. For this reason, this paper [...] Read more.
Over the last years collaborative robots have gained great success in manufacturing applications where human and robot work together in close proximity. However, current ISO/TS-15066-compliant implementations often limit the efficiency of collaborative tasks due to conservative speed restrictions. For this reason, this paper introduces a deep-learning-based human–robot–safety framework (HRSF) that aims at a dynamical adaptation of robot velocities depending on the separation distance between human and robot while respecting maximum biomechanical force and pressure limits. The applicability of the framework was investigated for four different deep learning approaches that can be used for human body extraction: human body recognition, human body segmentation, human pose estimation, and human body part segmentation. Unlike conventional industrial safety systems, the proposed HRSF differentiates individual human body parts from other objects, enabling optimized robot process execution. Experiments demonstrated a quantitative reduction in cycle time of up to 15% compared to conventional safety technology. Full article
Show Figures

Figure 1

26 pages, 6129 KB  
Article
VIPE: Visible and Infrared Fused Pose Estimation Framework for Space Noncooperative Objects
by Zhao Zhang, Dong Zhou, Yuhui Hu, Weizhao Ma, Guanghui Sun and Yuekan Zhang
Sensors 2025, 25(21), 6664; https://doi.org/10.3390/s25216664 - 1 Nov 2025
Viewed by 681
Abstract
Accurate pose estimation of non-cooperative space objects is crucial for applications such as satellite maintenance, space debris removal, and on-orbit assembly. However, monocular pose estimation methods face significant challenges in environments with limited visibility. Different from the traditional pose estimation methods that use [...] Read more.
Accurate pose estimation of non-cooperative space objects is crucial for applications such as satellite maintenance, space debris removal, and on-orbit assembly. However, monocular pose estimation methods face significant challenges in environments with limited visibility. Different from the traditional pose estimation methods that use images from a single band as input, we propose a novel deep learning-based pose estimation framework for non-cooperative space objects by fusing visible and infrared images. First, we introduce an image fusion subnetwork that integrates multi-scale features from visible and infrared images into a unified embedding space, preserving the detailed features of visible images and the intensity information of infrared images. Subsequently, we design a robust pose estimation subnetwork that leverages the rich information from the fused images to achieve accurate pose estimation. By combining these two subnetworks, we construct the Visible and Infrared Fused Pose Estimation Framework (VIPE) for non-cooperative space objects. Additionally, we present a Bimodal-Vision Pose Estimation (BVPE) dataset, comprising 3,630 visible-infrared image pairs, to facilitate research in this domain. Extensive experiments on the BVPE dataset demonstrate that VIPE significantly outperforms existing monocular pose estimation methods, particularly in complex space environments, providing more reliable and accurate pose estimation results. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

24 pages, 6113 KB  
Article
Vision-Based Reinforcement Learning for Robotic Grasping of Moving Objects on a Conveyor
by Yin Cao, Xuemei Xu and Yazheng Zhang
Machines 2025, 13(10), 973; https://doi.org/10.3390/machines13100973 - 21 Oct 2025
Viewed by 2059
Abstract
This study introduces an autonomous framework for grasping moving objects on a conveyor belt, enabling unsupervised detection, grasping, and categorization. The work focuses on two common object shapes—cylindrical cans and rectangular cartons—transported at a constant speed of 3–7 cm/s on the conveyor, emulating [...] Read more.
This study introduces an autonomous framework for grasping moving objects on a conveyor belt, enabling unsupervised detection, grasping, and categorization. The work focuses on two common object shapes—cylindrical cans and rectangular cartons—transported at a constant speed of 3–7 cm/s on the conveyor, emulating typical scenarios. The proposed framework combines a vision-based neural network for object detection, a target localization algorithm, and a deep reinforcement learning model for robotic control. Specifically, a YOLO-based neural network was employed to detect the 2D position of target objects. These positions are then converted to 3D coordinates, followed by pose estimation and error correction. A Proximal Policy Optimization (PPO) algorithm was then used to provide continuous control decisions for the robotic arm. A tailored reinforcement learning environment was developed using the Gymnasium interface. Training and validation were conducted on a 7-degree-of-freedom (7-DOF) robotic arm model in the PyBullet physics simulation engine. By leveraging transfer learning and curriculum learning strategies, the robotic agent effectively learned to grasp multiple categories of moving objects. Simulation experiments and randomized trials show that the proposed method enables the 7-DOF robotic arm to consistently grasp conveyor belt objects, achieving an approximately 80% success rate at conveyor speeds of 0.03–0.07 m/s. These results demonstrate the potential of the framework for deployment in automated handling applications. Full article
(This article belongs to the Special Issue AI-Integrated Advanced Robotics Towards Industry 5.0)
Show Figures

Figure 1

41 pages, 4151 KB  
Systematic Review
AI Video Analysis in Parkinson’s Disease: A Systematic Review of the Most Accurate Computer Vision Tools for Diagnosis, Symptom Monitoring, and Therapy Management
by Lazzaro di Biase, Pasquale Maria Pecoraro and Francesco Bugamelli
Sensors 2025, 25(20), 6373; https://doi.org/10.3390/s25206373 - 15 Oct 2025
Cited by 3 | Viewed by 2268
Abstract
Background. Clinical assessment of Parkinson’s disease (PD) is limited by high subjectivity and inter-rater variability. Markerless video analysis, namely Computer Vision (CV), offers objective and scalable characterization of motor signs. We systematically reviewed CV technologies suited for PD diagnosis, symptom monitoring, and treatment [...] Read more.
Background. Clinical assessment of Parkinson’s disease (PD) is limited by high subjectivity and inter-rater variability. Markerless video analysis, namely Computer Vision (CV), offers objective and scalable characterization of motor signs. We systematically reviewed CV technologies suited for PD diagnosis, symptom monitoring, and treatment management. Methods. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we searched PubMed for articles published between 1 January 1984 and 9 May 2025. We used the following search strategy: (“Parkinson Disease” [MeSH Terms] OR “parkinson’s disease” OR “parkinson disease”) AND (“computer vision” OR “video analysis” OR “pose estimation” OR “OpenPose” OR “DeepLabCut” OR “OpenFace” OR “YOLO” OR “MediaPipe” OR “markerless motion capture” OR “skeleton tracking”). Results. Out of 154 identified studies, 45 met eligibility criteria and were synthesized. Gait was assessed in 42% of studies, followed by bradykinesia items (17.7%). OpenPose and custom CV solutions were each used in 36% of studies, followed by MediaPipe (16%), DeepLabCut (9%), YOLO (4%). Across aims, CV pipelines consistently showed diagnostic discrimination and severity tracking aligned with expert ratings. Conclusions. CV non-invasively quantifies PD motor impairment, holding potential for objective diagnosis, longitudinal monitoring, and therapy response. Guidelines for standardized video-recording protocols and software usage are needed for real-world applications. Full article
(This article belongs to the Collection Sensors for Gait, Human Movement Analysis, and Health Monitoring)
Show Figures

Figure 1

31 pages, 10644 KB  
Article
An Instance Segmentation Method for Agricultural Plastic Residual Film on Cotton Fields Based on RSE-YOLO-Seg
by Huimin Fang, Quanwang Xu, Xuegeng Chen, Xinzhong Wang, Limin Yan and Qingyi Zhang
Agriculture 2025, 15(19), 2025; https://doi.org/10.3390/agriculture15192025 - 26 Sep 2025
Cited by 1 | Viewed by 934
Abstract
To address the challenges of multi-scale missed detections, false positives, and incomplete boundary segmentation in cotton field residual plastic film detection, this study proposes the RSE-YOLO-Seg model. First, a PKI module (adaptive receptive field) is integrated into the C3K2 block and combined with [...] Read more.
To address the challenges of multi-scale missed detections, false positives, and incomplete boundary segmentation in cotton field residual plastic film detection, this study proposes the RSE-YOLO-Seg model. First, a PKI module (adaptive receptive field) is integrated into the C3K2 block and combined with the SegNext attention mechanism (multi-scale convolutional kernels) to capture multi-scale residual film features. Second, RFCAConv replaces standard convolutional layers to differentially process regions and receptive fields of different sizes, and an Efficient-Head is designed to reduce parameters. Finally, an NM-IoU loss function is proposed to enhance small residual film detection and boundary segmentation. Experiments on a self-constructed dataset show that RSE-YOLO-Seg improves the object detection average precision (mAP50(B)) by 3% and mask segmentation average precision (mAP50(M)) by 2.7% compared with the baseline, with all module improvements being statistically significant (p < 0.05). Across four complex scenarios, it exhibits stronger robustness than mainstream models (YOLOv5n-seg, YOLOv8n-seg, YOLOv10n-seg, YOLO11n-seg), and achieves 17/38 FPS on Jetson Nano B01/Orin. Additionally, when combined with DeepSORT, compared with random image sampling, the mean error between predicted and actual residual film area decreases from 232.30 cm2 to 142.00 cm2, and the root mean square error (RMSE) drops from 251.53 cm2 to 130.25 cm2. This effectively mitigates pose-induced random errors in static images and significantly improves area estimation accuracy. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

30 pages, 2023 KB  
Review
Fusion of Computer Vision and AI in Collaborative Robotics: A Review and Future Prospects
by Yuval Cohen, Amir Biton and Shraga Shoval
Appl. Sci. 2025, 15(14), 7905; https://doi.org/10.3390/app15147905 - 15 Jul 2025
Cited by 6 | Viewed by 5517
Abstract
The integration of advanced computer vision and artificial intelligence (AI) techniques into collaborative robotic systems holds the potential to revolutionize human–robot interaction, productivity, and safety. Despite substantial research activity, a systematic synthesis of how vision and AI are jointly enabling context-aware, adaptive cobot [...] Read more.
The integration of advanced computer vision and artificial intelligence (AI) techniques into collaborative robotic systems holds the potential to revolutionize human–robot interaction, productivity, and safety. Despite substantial research activity, a systematic synthesis of how vision and AI are jointly enabling context-aware, adaptive cobot capabilities across perception, planning, and decision-making remains lacking (especially in recent years). Addressing this gap, our review unifies the latest advances in visual recognition, deep learning, and semantic mapping within a structured taxonomy tailored to collaborative robotics. We examine foundational technologies such as object detection, human pose estimation, and environmental modeling, as well as emerging trends including multimodal sensor fusion, explainable AI, and ethically guided autonomy. Unlike prior surveys that focus narrowly on either vision or AI, this review uniquely analyzes their integrated use for real-world human–robot collaboration. Highlighting industrial and service applications, we distill the best practices, identify critical challenges, and present key performance metrics to guide future research. We conclude by proposing strategic directions—from scalable training methods to interoperability standards—to foster safe, robust, and proactive human–robot partnerships in the years ahead. Full article
Show Figures

Figure 1

24 pages, 5534 KB  
Article
Enhancing Healthcare Assistance with a Self-Learning Robotics System: A Deep Imitation Learning-Based Solution
by Yagna Jadeja, Mahmoud Shafik, Paul Wood and Aaisha Makkar
Electronics 2025, 14(14), 2823; https://doi.org/10.3390/electronics14142823 - 14 Jul 2025
Viewed by 1695
Abstract
This paper presents a Self-Learning Robotic System (SLRS) for healthcare assistance using Deep Imitation Learning (DIL). The proposed SLRS solution can observe and replicate human demonstrations, thereby acquiring complex skills without the need for explicit task-specific programming. It incorporates modular components for perception [...] Read more.
This paper presents a Self-Learning Robotic System (SLRS) for healthcare assistance using Deep Imitation Learning (DIL). The proposed SLRS solution can observe and replicate human demonstrations, thereby acquiring complex skills without the need for explicit task-specific programming. It incorporates modular components for perception (i.e., advanced computer vision methodologies), actuation (i.e., dynamic interaction with patients and healthcare professionals in real time), and learning. The innovative approach of implementing a hybrid model approach (i.e., deep imitation learning and pose estimation algorithms) facilitates autonomous learning and adaptive task execution. The environmental awareness and responsiveness were also enhanced using both a Convolutional Neural Network (CNN)-based object detection mechanism using YOLOv8 (i.e., with 94.3% accuracy and 18.7 ms latency) and pose estimation algorithms, alongside a MediaPipe and Long Short-Term Memory (LSTM) framework for human action recognition. The developed solution was tested and validated in healthcare, with the aim to overcome some of the current challenges, such as workforce shortages, ageing populations, and the rising prevalence of chronic diseases. The CAD simulation, validation, and verification tested functions (i.e., assistive functions, interactive scenarios, and object manipulation) of the system demonstrated the robot’s adaptability and operational efficiency, achieving an 87.3% task completion success rate and over 85% grasp success rate. This approach highlights the potential use of an SLRS for healthcare assistance. Further work will be undertaken in hospitals, care homes, and rehabilitation centre environments to generate complete holistic datasets to confirm the system’s reliability and efficiency. Full article
Show Figures

Figure 1

17 pages, 3331 KB  
Article
Automated Cattle Head and Ear Pose Estimation Using Deep Learning for Animal Welfare Research
by Sueun Kim
Vet. Sci. 2025, 12(7), 664; https://doi.org/10.3390/vetsci12070664 - 13 Jul 2025
Viewed by 1651
Abstract
With the increasing importance of animal welfare, behavioral indicators such as changes in head and ear posture are widely recognized as non-invasive and field-applicable markers for evaluating the emotional state and stress levels of animals. However, traditional visual observation methods are often subjective, [...] Read more.
With the increasing importance of animal welfare, behavioral indicators such as changes in head and ear posture are widely recognized as non-invasive and field-applicable markers for evaluating the emotional state and stress levels of animals. However, traditional visual observation methods are often subjective, as assessments can vary between observers, and are unsuitable for long-term, quantitative monitoring. This study proposes an artificial intelligence (AI)-based system for the detection and pose estimation of cattle heads and ears using deep learning techniques. The system integrates Mask R-CNN for accurate object detection and FSA-Net for robust 3D pose estimation (yaw, pitch, and roll) of cattle heads and left ears. Comprehensive datasets were constructed from images of Japanese Black cattle, collected under natural conditions and annotated for both detection and pose estimation tasks. The proposed framework achieved mean average precision (mAP) values of 0.79 for head detection and 0.71 for left ear detection and mean absolute error (MAE) of approximately 8–9° for pose estimation, demonstrating reliable performance across diverse orientations. This approach enables long-term, quantitative, and objective monitoring of cattle behavior, offering significant advantages over traditional subjective stress assessment methods. The developed system holds promise for practical applications in animal welfare research and real-time farm management. Full article
Show Figures

Figure 1

Back to TopTop