MDPI - Publisher of Open Access Journals

25 pages, 5544 KB

Open AccessArticle

Retrofitting a Legacy Industrial Robot Through Monocular Computer Vision-Based Human-Arm Posture Tracking and 3-DoF Robot-Axis Control (A1–A3)

by Paúl A. Chasi-Pesantez, Eduardo J. Astudillo-Flores, Valeria A. Dueñas-López, Jorge O. Ordoñez-Ordoñez, Eldad Holdengreber and Luis Fernando Guerrero-Vásquez

Robotics 2026, 15(4), 82; https://doi.org/10.3390/robotics15040082 (registering DOI) - 21 Apr 2026

Abstract

This paper presents a low-cost retrofitting pipeline for a legacy industrial robot that uses a single RGB webcam and monocular 2D keypoint tracking to estimate human-arm posture angles

θ^{(h)}

and map them to robot-axis joint targets [...] Read more.

This paper presents a low-cost retrofitting pipeline for a legacy industrial robot that uses a single RGB webcam and monocular 2D keypoint tracking to estimate human-arm posture angles

θ^{(h)}

and map them to robot-axis joint targets

q_{cmd}^{(r)}

for A1–A3 on a KUKA KR5-2 ARC HW, while keeping the wrist orientation (A4–A6) fixed. Rather than targeting full six-DoF manipulation, the main contribution is an experimental characterization of how far monocular 2D posture-to-axis mapping can be used reliably for coarse placement and safeguarded low-speed demonstrations on a legacy robot platform. Vision-side accuracy was evaluated per axis against goniometer-based reference angles

θ_{ref}^{(h)}

, showing low errors for A2–A3 within the tested range and larger errors for A1 due to monocular yaw/depth ambiguity and occlusions. The study also analyzes failure modes during simultaneous multi-joint motion, where performance degrades notably, especially for A2 and A3, and reports practical mitigation directions such as improved viewpoints, multi-view/depth sensing, and stricter dropout handling. Runtime behavior is additionally characterized through a loop timing budget, with an end-to-end latency of 185.44 ms and an effective loop frequency of 5.39 Hz, which is consistent with low-speed online operation within the demonstrated scope. The system was implemented in a fenced industrial cell with restricted access and emergency stop; no collaborative operation is claimed. Full article

(This article belongs to the Special Issue Artificial Vision Systems for Robotics)

► Show Figures

Figure 1

22 pages, 12163 KB

Open AccessArticle

SV-LIO: A Probabilistic Adaptive Semantic Voxel Map for LiDAR–Inertial Odometry

by Lixiao Yang and Youbing Feng

Electronics 2026, 15(8), 1744; https://doi.org/10.3390/electronics15081744 - 20 Apr 2026

Abstract

Accurate and real-time localization is a fundamental prerequisite for the autonomous navigation of mobile robots. LiDAR–Inertial Odometry (LIO) achieves high-precision state estimation and scene reconstruction in unknown environments by effectively fusing data from LiDAR and Inertial Measurement Units (IMU). However, conventional LIO methods [...] Read more.

Accurate and real-time localization is a fundamental prerequisite for the autonomous navigation of mobile robots. LiDAR–Inertial Odometry (LIO) achieves high-precision state estimation and scene reconstruction in unknown environments by effectively fusing data from LiDAR and Inertial Measurement Units (IMU). However, conventional LIO methods typically rely solely on geometric features during point cloud registration. In complex scenarios, such as outdoor unstructured or dynamic environments, these methods are often susceptible to reduced localization accuracy due to geometric degeneration or mismatches. To address these challenges, we propose SV-LIO, A Probabilistic Adaptive Semantic Voxel Map for LiDAR–Inertial Odometry, which leverages point-wise semantic information from semantic segmentation to enhance registration accuracy and system robustness. Specifically, we construct a probabilistic adaptive semantic voxel map that extracts multi-scale spatial planes attached with semantic information. Building on this representation, we employ a semantic-guided strategy for nearest-neighbor plane association between LiDAR scans and the local map, and construct semantic-weighted point-to-plane residuals to constrain pose estimation. By jointly optimizing the IMU-propagated pose prior and semantic-guided LiDAR observation constraints, SV-LIO realizes high-precision real-time state estimation and semantic scene reconstruction. Extensive experiments on the KITTI dataset demonstrate that SV-LIO achieves significant improvements in both localization accuracy compared to state-of-the-art (SOTA) LIO methods, while also constructing semantic maps capable of providing rich environmental information. Full article

(This article belongs to the Section Electrical and Autonomous Vehicles)

37 pages, 4888 KB

Open AccessReview

Robotics in Precision Agriculture: Task-, Platform-, and Evaluation-Oriented Review

by Natheer Almtireen and Mutaz Ryalat

Robotics 2026, 15(4), 81; https://doi.org/10.3390/robotics15040081 - 20 Apr 2026

Abstract

Robotics is increasingly positioned as an enabling technology for precision agriculture, where management actions must be spatially and temporally targeted under constraints on labour, input use, safety, and environmental impact. This review synthesises studies on agricultural field robotics and organises the literature along [...] Read more.

Robotics is increasingly positioned as an enabling technology for precision agriculture, where management actions must be spatially and temporally targeted under constraints on labour, input use, safety, and environmental impact. This review synthesises studies on agricultural field robotics and organises the literature along four complementary axes: task (monitoring, weeding, spraying, and harvesting), platform (UGV, UAV, gantry/fixed-structure, greenhouse robot, and hybrid systems), autonomy-stack module (perception, localisation, planning, control, actuation, safety, and human–robot interaction), and evaluation setting (lab, greenhouse, open-field single season, and open-field multi-season/multi-site). Across these dimensions, this review analyses how platform constraints shape sensing geometry, actuation capability, localisation reliability, energy/endurance, supervision burden, and safety requirements. It further examines enabling technologies that recur across tasks, including vision and multimodal perception under occlusion and illumination variability, localisation and mapping under weak or denied GNSS, uncertainty-aware planning in deformable and partially observed environments, and compliant end-effectors for contact-rich operations. Beyond cataloguing systems, this paper emphasises evaluation practice by synthesising core task-relevant metrics, comparing laboratory and field validation settings, and proposing a reporting checklist and benchmark ladder to improve reproducibility and cross-study comparability. This review identifies recurring bottlenecks in domain shift, long-term autonomy, calibration robustness, crop-safe actuation, and safety assurance near humans, and it concludes with a staged research roadmap linking near-term evaluation reform to longer-term credible multi-site autonomy. Overall, this paper provides a structured framework for interpreting agricultural robotic systems not only by application but also by deployment context, system maturity, and evaluation credibility. Full article

(This article belongs to the Special Issue Perception and AI for Field Robotics)

► Show Figures

Figure 1

24 pages, 5983 KB

Open AccessArticle

‌Visual Understanding of Intelligent Apple Picking: Detection-Segmentation Joint Architecture Based on Improved YOLOv11

by Bin Yan and Qianru Wu

Horticulturae 2026, 12(4), 494; https://doi.org/10.3390/horticulturae12040494 - 18 Apr 2026

Viewed by 308

Abstract

Achieving precise fruit localization and fine branch segmentation simultaneously in unstructured orchard environments remains challenging due to variable lighting, occlusion, and complex backgrounds. This study proposed a joint detection–segmentation architecture based on an improved YOLOv11 network for collaborative perception of apples and tree [...] Read more.

Achieving precise fruit localization and fine branch segmentation simultaneously in unstructured orchard environments remains challenging due to variable lighting, occlusion, and complex backgrounds. This study proposed a joint detection–segmentation architecture based on an improved YOLOv11 network for collaborative perception of apples and tree branches. First, a dual-task dataset of spindle-type apple orchards was constructed with bounding-box annotations for fruits and pixel-level polygon masks for branches, encompassing diverse illumination and occlusion conditions. Second, Convolutional Block Attention Modules (CBAMs) are strategically embedded into the YOLOv11 backbone to enhance feature discrimination for slender branch structures while preserving high fruit detection accuracy. The enhanced model achieves precision of 0.981, recall of 0.986, and F1-score of 0.983 for apple detection, and precision of 0.803, recall of 0.715, mAP of 0.698, and IoU of 0.6066 for branch segmentation on the validation set. Comparative experiments against YOLOv8 and baseline YOLOv11 confirm improved segmentation continuity and finer branch delineation. The proposed integrated perception framework provides reliable visual guidance for collision-avoidance robotic harvesting and offers a practical reference for multi-task agricultural vision systems. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence and Simulation Technology in Fruits and Vegetables Production)

28 pages, 10999 KB

Open AccessArticle

Introducing Brain–Computer Interfaces in Factories and Fabrication Lines for the Inclusion of Disabled Workers–Industry 5.0—A Modern Challenge and Opportunity

by Marian-Silviu Poboroniuc, Zoltán Nochta, Martin Klepal, Nina Hunter, Danut-Constantin Irimia, Alina Georgiana Baciu, Kelaja Schert, Tim Piotrowski and Alexandru Mitocaru

Multimodal Technol. Interact. 2026, 10(4), 41; https://doi.org/10.3390/mti10040041 - 17 Apr 2026

Viewed by 115

Abstract

Flexible factories and adaptive fabrication lines offer a testbed for advanced multimodal interaction concepts that can support the inclusion of disabled workers in Industry 5.0 manufacturing systems. The study synthesizes interdisciplinary data from ergonomics, industrial automation, and EU regulatory frameworks to establish a [...] Read more.

Flexible factories and adaptive fabrication lines offer a testbed for advanced multimodal interaction concepts that can support the inclusion of disabled workers in Industry 5.0 manufacturing systems. The study synthesizes interdisciplinary data from ergonomics, industrial automation, and EU regulatory frameworks to establish a conceptual model for human-machine interaction. Building on conceptual modeling and a structured literature analysis, the study proposes a six-step integration framework that links task demands, worker capabilities, and interaction modalities within human-in-the-loop manufacturing environments. Although no empirical case study was conducted in this phase, an exemplary application is presented for a semi-automated bike wheel manufacturing process. Detailed machine-based assembly line flows and simulated process data were utilized for illustrative purposes to depict the process and validate the proposed Capability–Task Matching Matrix. The results operationalize the human-centric vision of Industry 5.0 by providing a structured methodology for the inclusion of disabled workers within fabrication environments. The findings are organized into two primary components: the conceptual development of the Integration Approach and its practical application to a semi-automated industrial use-case. Finally, a particular focus is placed on Brain–Computer Interfaces (BCIs) as an emerging interaction channel that enables non-muscular control, attention monitoring, and neuroadaptive feedback, complementing conventional interfaces rather than replacing them. The framework is illustrated through application to the same semi-automated bicycle wheel assembly line, where BCI-supported interaction, augmented interfaces, and robotic assistance are mapped to specific production tasks and assessed in terms of feasibility and technological maturity. Drawing on the paper’s results, an explanatory 10-year roadmap outlines the feasibility and phased deployment of BCI solutions. It aligns technological advances with European regulations and a vision for a fully inclusive manufacturing enterprise. Full article

24 pages, 1651 KB

Open AccessArticle

An Integrated Tunable-Focus Light Field Imaging System for 3D Seed Phenotyping: From Co-Optimized Optical Design to Computational Reconstruction

by Jingrui Yang, Qinglei Zhao, Shuai Liu, Meihua Xia, Jing Guo, Yinghong Yu, Chao Li, Xiao Tang, Shuxin Wang, Qinglong Hu, Fengwei Guan, Qiang Liu, Mingdong Zhu and Qi Song

Photonics 2026, 13(4), 385; https://doi.org/10.3390/photonics13040385 - 17 Apr 2026

Viewed by 121

Abstract

Three-dimensional seed phenotyping requires imaging systems capable of achieving micron-level resolution across a centimeter-level field of view (FOV), a goal constrained by the resolution–FOV trade-off in conventional light field architectures. This paper presents a hardware–software co-optimized framework that integrates a reconfigurable optical system [...] Read more.

Three-dimensional seed phenotyping requires imaging systems capable of achieving micron-level resolution across a centimeter-level field of view (FOV), a goal constrained by the resolution–FOV trade-off in conventional light field architectures. This paper presents a hardware–software co-optimized framework that integrates a reconfigurable optical system with computational imaging pipelines to address this limitation. At the hardware level, we develop a tunable-focus lens module that enables flexible adjustment of the effective focal length, combined with a custom-designed microlens array (MLA). A mathematical model is established to analyze the interdependencies among FOV, lateral resolution, depth of field (DOF), and system configuration, guiding the design of individual optical components. On the computational side, we propose a hybrid aberration correction strategy: first, a co-calibration of lens and MLA aberrations based on line-feature detection; second, a conditional generative adversarial network (cGAN) with attention-guided residual learning to enhance sub-aperture images, achieving a PSNR of 34.63 dB and an SSIM of 0.9570 on seed datasets. Experimentally, the system achieves a resolution of 6.2 lp/mm at MTF50 over a 2–3 cm FOV, representing a 307% improvement over the initial configuration (1.52 lp/mm). The reconstruction pipeline combines epipolar plane image (EPI) analysis with multi-view consistency constraints to generate dense 3D point clouds at a density of approximately 1.5 × 10⁴ points/cm² while preserving spectral and textural features. Validation on bitter melon and rice seeds demonstrates accurate 3D reconstruction and accurate extraction of morphological parameters across a large area. By integrating optical and computational design, this work establishes a reconfigurable imaging framework that overcomes the resolution–FOV limitations of conventional light field systems. The proposed architecture is also applicable to robotic vision and biomedical imaging. Full article

(This article belongs to the Special Issue Optical Imaging and Measurements: 2nd Edition)

39 pages, 6816 KB

Open AccessArticle

Automatic Calibration of Robotic 3D Printer Swarms for Cooperative 3D Printing

by Swaleh Owais, Charith Oshadi Nanayakkara Ratnayake, Ali Ugur, Zhenghui Sha and Wenchao Zhou

Machines 2026, 14(4), 443; https://doi.org/10.3390/machines14040443 - 16 Apr 2026

Viewed by 132

Abstract

Cooperative 3D printing (C3DP) is an additive manufacturing paradigm where a swarm of robotic 3D printers work cooperatively in a shared environment to fabricate continuous parts. Reliable operation requires both accurate per-printer kinematic calibration and cross-printer spatial alignment. This paper presents an automatic [...] Read more.

Cooperative 3D printing (C3DP) is an additive manufacturing paradigm where a swarm of robotic 3D printers work cooperatively in a shared environment to fabricate continuous parts. Reliable operation requires both accurate per-printer kinematic calibration and cross-printer spatial alignment. This paper presents an automatic vision-based XY calibration workflow for C3DP using ArUco fiducials and low-cost monocular cameras. The method performs intra-printer kinematic calibration and inter-printer alignment through peer-to-peer observations without fixed global infrastructure. In a two-printer Selective Compliance Assembly Robot Arm (SCARA) Fused Filament Fabrication (FFF) testbed, the automatic workflow reduced total calibration time from 157.19 min (manual) to 36.49 min while improving positional consistency and print accuracy. For individual-printer artifacts, the mean Euclidean error was 0.03 ± 0.02 mm, whereas cooperative artifacts exhibited a mean Euclidean error of 0.078 ± 0.002 mm. These results show that practical and repeatable C3DP calibration can be achieved with low-cost vision hardware. Full article

(This article belongs to the Special Issue Advances in Additive Manufacturing: Processes, Systems, and Emerging Horizons)

25 pages, 1722 KB

Open AccessSystematic Review

Deep Learning in the Architecture, Engineering, and Construction (AEC) Industry: Methods, Challenges, and Emerging Opportunities

by Muhammad Imran Khan, Abdul Waheed, Ehsan Harirchian and Bilal Manzoor

Buildings 2026, 16(8), 1546; https://doi.org/10.3390/buildings16081546 - 14 Apr 2026

Viewed by 247

Abstract

In recent years, deep learning (DL) has emerged as a transformative technology with significant potential to advance the Architecture, Engineering, and Construction (AEC) industry. DL enables automation, intelligent decision-making, and predictive analytics across various phases of construction, including design, site monitoring, safety management, [...] Read more.

In recent years, deep learning (DL) has emerged as a transformative technology with significant potential to advance the Architecture, Engineering, and Construction (AEC) industry. DL enables automation, intelligent decision-making, and predictive analytics across various phases of construction, including design, site monitoring, safety management, and facility operations. Despite its growing adoption, research on the comprehensive methods, practical challenges and emerging opportunities of DL in the AEC industry remains limited. This study presents a state-of-the-art review of DL applications in the AEC industry by focusing on key methods, challenges, emerging opportunities and future research directions. A systematic literature review (SLR) was conducted in this study. Three major DL methods applied in the AEC industry were examined: (i) data-driven computer vision, (ii) natural language processing (NLP), and (iii) generative and simulation-based methods. Key challenges were identified: (i) data scarcity issues, (ii) high computational requirements, (iii) limited generalization across projects, (iv) human factors and resistance to adoption, and (v) lack of standardization and interoperability. Additionally, emerging opportunities and future research directions are highlighted: (i) advanced construction site monitoring and safety management, (ii) automated design and generative modeling, (iii) predictive maintenance and facility management, (iv) integration with robotics and autonomous construction systems, and (v) smart project management and decision support systems. This study advances a holistic understanding of DL in the AEC industry by systematically synthesizing current methods, challenges, and emerging trends. It establishes a structured foundation for future research to overcome technical, practical, and organizational challenges, thereby supporting the scalable, intelligent, and sustainable transformation of construction practices. Full article

(This article belongs to the Topic Advances in Structural Engineering Using AI and Sustainable Materials)

► Show Figures

Figure 1

27 pages, 1046 KB

Open AccessArticle

Hybrid Vision Navigation with Hierarchical VLM–LLM Decision Making

by Rihem Farkh, Ghislain Oudinet, Mohamed Adjou, Alaeddine Moussa and Yasser Fouad

Machines 2026, 14(4), 435; https://doi.org/10.3390/machines14040435 - 14 Apr 2026

Viewed by 221

Abstract

This paper presents a hybrid navigation architecture for mapless navigation based on monocular vision. The system combines perception-driven affordance control with event-triggered semantic reasoning within a unified decision framework. Navigation behavior is governed by interpretable perceptual signals and a vision-derived progress proxy that [...] Read more.

This paper presents a hybrid navigation architecture for mapless navigation based on monocular vision. The system combines perception-driven affordance control with event-triggered semantic reasoning within a unified decision framework. Navigation behavior is governed by interpretable perceptual signals and a vision-derived progress proxy that enables self-monitoring. A reactive control regime ensures real-time safety, while a semantic reasoning module is activated only under persistent navigation difficulty to provide structured guidance. Experimental results in simulation and real-world deployment demonstrate improved success rate, safety, and efficiency compared to reactive and continuously active semantic baselines, while maintaining real-time performance on embedded hardware. Full article

(This article belongs to the Special Issue Intelligent Control for Autonomous and Unmanned Systems)

28 pages, 3548 KB

Open AccessArticle

Edge Computing Approach to AI-Based Gesture for Human–Robot Interaction and Control

by Nikola Ivačko, Ivan Ćirić and Miloš Simonović

Computers 2026, 15(4), 241; https://doi.org/10.3390/computers15040241 - 14 Apr 2026

Viewed by 343

Abstract

This paper presents an edge-deployable vision-based framework for human–robot interaction using a xArm collaborative robot and a single RGB camera mounted on the robot wrist, and lightweight AI-based perception modules. The system enables intuitive, contact-free control by combining hand understanding and object detection [...] Read more.

This paper presents an edge-deployable vision-based framework for human–robot interaction using a xArm collaborative robot and a single RGB camera mounted on the robot wrist, and lightweight AI-based perception modules. The system enables intuitive, contact-free control by combining hand understanding and object detection within a unified perception–decision–control pipeline. Hand landmarks are extracted using MediaPipe Hands, from which continuous hand trajectories, static gestures, and dynamic gestures are derived. Task objects are detected using a YOLO-based model, and both hand and object observations are mapped into the robot workspace using ArUco-based planar calibration. To ensure stable robot motion, the hand control signal is smoothed using low-pass and Kalman filtering, while dynamic gestures such as waving are recognized using a lightweight LSTM classifier. The complete pipeline runs locally on edge hardware, specifically NVIDIA Jetson Orin Nano and Raspberry Pi 5 with a Hailo AI accelerator. Experimental evaluation includes trajectory stability, gesture recognition reliability, and runtime performance on both platforms. Results show that filtering significantly reduces hand-tracking jitter, gesture recognition provides stable command states for control, and both edge devices support real-time operation, with Jetson achieving consistently lower runtime than Raspberry Pi. The proposed system demonstrates the feasibility of low-cost edge AI solutions for responsive and practical human–robot interaction in collaborative industrial environments. Full article

(This article belongs to the Special Issue Intelligent Edge: When AI Meets Edge Computing)

► Show Figures

Figure 1

13 pages, 4062 KB

Open AccessArticle

Robotic Harvesting of Apples Using ROS2

by Connor Ruybalid, Christian Salisbury and Duke M. Bulanon

Machines 2026, 14(4), 433; https://doi.org/10.3390/machines14040433 - 14 Apr 2026

Viewed by 308

Abstract

Rising global food demand, increasing labor costs, and farm labor shortages have created significant challenges for specialty crop production, particularly in labor-intensive tasks such as fruit harvesting. Robotic harvesting offers a promising long-term solution, yet its adoption in orchard environments remains limited due [...] Read more.

Rising global food demand, increasing labor costs, and farm labor shortages have created significant challenges for specialty crop production, particularly in labor-intensive tasks such as fruit harvesting. Robotic harvesting offers a promising long-term solution, yet its adoption in orchard environments remains limited due to unstructured conditions, variable lighting, and difficulties in fruit recognition and manipulation. This study presents an improved robotic fruit harvesting system, Orchard roBot (OrBot), developed by the Robotics Vision Lab at Northwest Nazarene University, with the goal of advancing autonomous apple harvesting applications. The updated OrBot platform integrates a dual-camera vision system consisting of an eye-to-hand stereo camera with a wide field of view for fruit detection and an eye-in-hand RGB-D camera for precise manipulation. The control architecture was redesigned using Robot Operating System 2 (ROS2) and Python, enabling modular subsystem development and coordination. Fruit detection was performed using a YOLOv5 deep learning model, and visual servoing was employed to guide the robotic manipulator toward the target fruit. System performance was evaluated through laboratory experiments using artificial trees and field tests conducted in a commercial apple orchard in Idaho. OrBot achieved a 100% harvesting success rate in indoor tests and a 75–80% success rate in outdoor orchard conditions. Experimental results demonstrate that the dual-camera approach significantly enhances fruit search efficiency and harvesting efficiency. Identified limitations include sensitivity to lighting conditions, end effector performance with varying fruit sizes, and depth estimation errors. Overall, the results indicate a positive potential toward effective robotic fruit harvesting and highlight key areas for future improvement in vision, manipulation, and system robustness. Full article

(This article belongs to the Special Issue Recent Developments in Machine Design, Automation and Robotics, Second Edition)

► Show Figures

Figure 1

40 pages, 4155 KB

Open AccessReview

Artificial Intelligence in Pulmonary Endoscopy: Current Evidence, Limitations, and Future Directions

by Sara Lopes, Miguel Mascarenhas, João Fonseca and Adelino F. Leite-Moreira

J. Imaging 2026, 12(4), 167; https://doi.org/10.3390/jimaging12040167 - 12 Apr 2026

Viewed by 211

Abstract

Background: Artificial intelligence (AI) is increasingly applied in pulmonary endoscopy, including diagnostic bronchoscopy, interventional pulmonology and endobronchial imaging. Advances in computer vision, machine learning and robotic systems have expanded the potential for automated lesion detection, navigation to peripheral pulmonary lesions, and real-time [...] Read more.

Background: Artificial intelligence (AI) is increasingly applied in pulmonary endoscopy, including diagnostic bronchoscopy, interventional pulmonology and endobronchial imaging. Advances in computer vision, machine learning and robotic systems have expanded the potential for automated lesion detection, navigation to peripheral pulmonary lesions, and real-time procedural support. However, the current evidence base remains heterogeneous, and translational challenges persist. Methods: This review summarizes current applications and developments of AI across white-light bronchoscopy (WLB), image-enhanced bronchoscopy (e.g., narrow-band imaging and autofluorescence imaging), endobronchial ultrasound (EBUS), virtual and robotic bronchoscopies, and workflow optimization and training. The authors also examine the methodological limitations, regulatory considerations, and implementation barriers that affect translation into routine practice. Results: Reported developments include deep learning-based models for mucosal abnormality detection, lymph-node characterization during EBUS-guided transbronchial needle aspiration (EBUS-TBNA), improved lesion localization, and reduction in operator-dependent variability. Additionally, AI-assisted simulation platforms and decision-support tools are reshaping training paradigms. Nevertheless, most studies remain retrospective or single-center, with limited external validation, dataset heterogeneity, unclear model explainability, and incomplete integration into clinical workflows. Conclusions: AI has the potential to support lesion detection, navigation, and training in pulmonary endoscopy. However, robust prospective validation, standardized datasets, transparent model reporting, robust data governance, multidisciplinary collaboration, and careful integration into clinical practice are required before widespread adoption. Full article

(This article belongs to the Section AI in Imaging)

13 pages, 1462 KB

Open AccessArticle

Interpretable Vision Transformers in Monocular Depth Estimation via SVDA

by Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis and Nikos Papamarkos

Mathematics 2026, 14(8), 1272; https://doi.org/10.3390/math14081272 - 11 Apr 2026

Viewed by 365

Abstract

Monocular depth estimation is a central problem in computer vision with applications in robotics, augmented reality, and autonomous driving, yet the self-attention mechanisms used by modern Transformer architectures remain opaque. In this work, we integrate SVD-Inspired Attention (SVDA) into the Dense Prediction Transformer [...] Read more.

Monocular depth estimation is a central problem in computer vision with applications in robotics, augmented reality, and autonomous driving, yet the self-attention mechanisms used by modern Transformer architectures remain opaque. In this work, we integrate SVD-Inspired Attention (SVDA) into the Dense Prediction Transformer (DPT), introducing a spectrally structured attention formulation for dense prediction that decouples directional alignment from spectral modulation through a learnable diagonal matrix embedded in normalized query–key interactions. Experiments on KITTI and NYU-v2 show that SVDA preserves competitive predictive performance while enabling intrinsic interpretability: on KITTI, AbsRel improves from 0.058 to 0.056 and

δ_{1}

from 0.976 to 0.979, while on NYU-v2, AbsRel improves from 0.133 to 0.124 and

δ_{1}

from 0.865 to 0.872. This is achieved with only 0.01% additional parameters, at the cost of a measurable runtime overhead associated with the added normalization and spectral modulation. More importantly, SVDA enables six spectral indicators that quantify entropy, rank, sparsity, alignment, selectivity, and robustness, revealing consistent cross-dataset and depth-wise patterns in how attention organizes during training. These properties make the model easier to inspect and better suited to applications where transparency and reliability are important, such as robotics and autonomous navigation. Full article

(This article belongs to the Special Issue Mathematics for Visual Computing: Acquisition, Processing, Analysis and Rendering of Visual Information)

► Show Figures

Figure 1

45 pages, 6164 KB

Open AccessSystematic Review

Advances in Emerging Digital Technologies for Sustainable Agriculture: Applications and Future Perspectives

by Carlos Diego Rodríguez-Yparraguirre, Abel José Rodríguez-Yparraguirre, Cesar Moreno-Rojo, Wendy Akemmy Castañeda-Rodríguez, Janet Verónica Saavedra-Vera, Atilio Ruben Lopez-Carranza, Iván Martin Olivares-Espino, Andrés David Epifania-Huerta, Elías Guarniz-Vásquez and Wilson Arcenio Maco-Vasquez

Earth 2026, 7(2), 63; https://doi.org/10.3390/earth7020063 - 11 Apr 2026

Viewed by 277

Abstract

The agricultural sector is undergoing a profound digital transformation driven by artificial intelligence, the Internet of Things, remote sensing, robotics, blockchain, and edge computing, which are being integrated into crop monitoring, irrigation management, disease detection, and supply chain transparency systems. This study employs [...] Read more.

The agricultural sector is undergoing a profound digital transformation driven by artificial intelligence, the Internet of Things, remote sensing, robotics, blockchain, and edge computing, which are being integrated into crop monitoring, irrigation management, disease detection, and supply chain transparency systems. This study employs systematic evidence mapping to characterize the applications of emerging digital technologies in sustainable agriculture; it delineates technological trajectories, areas of application, implementation gaps, and opportunities for improvement. Adhering to the PRISMA 2020 reporting protocol, 101 peer-reviewed articles indexed in Scopus and Web of Science (2020–2025) were identified, screened, and subjected to integrated thematic and bibliometric synthesis, using RStudio Version: 2026.01.1+403 and VOSviewer 1.6.20 for data mining on keywords and technological evolution patterns. Results show that deep learning and computer vision models achieved diagnostic accuracies of 90–99%, smart irrigation systems reduced water consumption by 10–30%, predictive yield models frequently reported R² values above 0.80, and greenhouse automation reduced energy consumption by approximately 20–30%. Blockchain-based architectures improved traceability and secure data transmission by 15–20%, while remote sensing integration enhanced spatial estimation accuracy up to R² = 0.92. The findings demonstrate a measurable transition toward data-driven, resource-efficient agricultural ecosystems supported by validated digital architectures. However, interoperability limitations, lack of standardized performance metrics, scalability challenges, and uneven geographical implementation—identified in nearly 40% of studies—highlight the need for harmonized evaluation frameworks, cross-platform integration standards, and long-term field validation to ensure sustainable and scalable digital transformation. Full article

28 pages, 3527 KB

Open AccessArticle

Autonomous Tomato Harvesting System Integrating AI-Controlled Robotics in Greenhouses

by Mihai Gabriel Matache, Florin Bogdan Marin, Catalin Ioan Persu, Robert Dorin Cristea, Florin Nenciu and Atanas Z. Atanasov

Agriculture 2026, 16(8), 847; https://doi.org/10.3390/agriculture16080847 - 11 Apr 2026

Viewed by 855

Abstract

Labor shortages and the need for increased productivity have accelerated the development of robotic harvesting systems for greenhouse crops; however, reliable operation under fruit occlusion and clustered arrangements remains a major challenge, particularly due to the limited integration between perception and motion planning [...] Read more.

Labor shortages and the need for increased productivity have accelerated the development of robotic harvesting systems for greenhouse crops; however, reliable operation under fruit occlusion and clustered arrangements remains a major challenge, particularly due to the limited integration between perception and motion planning modules. The paper presents the design and experimental validation of an autonomous robotic system for greenhouse tomato harvesting. The proposed platform integrates a rail-guided mobile base, a six-degrees-of-freedom robotic manipulator, and an adaptive end effector with a hybrid vision framework that combines convolutional neural networks and watershed-based segmentation to enable robust fruit detection and localization under occluded conditions. The proposed approach enables improved separation of overlapping fruits and provides accurate spatial localization through stereo vision combined with IMU-assisted camera-to-robot coordinate transformation. An occlusion-aware trajectory planning strategy was developed to generate collision-free manipulation paths in the presence of leaves and stems, enhancing harvesting safety and reliability. The system was trained and evaluated using a dataset of real greenhouse images supplemented with synthetic data augmentation. Experimental trials conducted under practical greenhouse conditions demonstrated a fruit detection precision of 96.9%, recall of 93.5%, and mean Intersection-over-Union of 79.2%. The robotic platform achieved an overall harvesting success rate of 78.5%, reaching 85% for unobstructed fruits, with an average cycle time of 15 s per fruit in direct harvesting scenarios. The rail-guided mobility significantly improved positioning stability and repeatability during manipulation compared with fully mobile platforms. The results confirm that integrating hybrid perception with occlusion-aware motion planning can substantially improve the functionality of robotic harvesting systems in protected cultivation environments. The proposed solution contributes to the advancement of automation technologies for greenhouse vegetable production and supports the transition toward more sustainable and labor-efficient agricultural practices. Full article

(This article belongs to the Special Issue AI-Powered Agricultural Robots: From Field Sensing to Autonomous Operation)

► Show Figures

Figure 1

Search Results (1,834)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,834)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI