MDPI - Publisher of Open Access Journals

13 pages, 1462 KB

Open AccessArticle

Interpretable Vision Transformers in Monocular Depth Estimation via SVDA

by Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis and Nikos Papamarkos

Mathematics 2026, 14(8), 1272; https://doi.org/10.3390/math14081272 (registering DOI) - 11 Apr 2026

Monocular depth estimation is a central problem in computer vision with applications in robotics, augmented reality, and autonomous driving, yet the self-attention mechanisms used by modern Transformer architectures remain opaque. In this work, we integrate SVD-Inspired Attention (SVDA) into the Dense Prediction Transformer [...] Read more.

Monocular depth estimation is a central problem in computer vision with applications in robotics, augmented reality, and autonomous driving, yet the self-attention mechanisms used by modern Transformer architectures remain opaque. In this work, we integrate SVD-Inspired Attention (SVDA) into the Dense Prediction Transformer (DPT), introducing a spectrally structured attention formulation for dense prediction that decouples directional alignment from spectral modulation through a learnable diagonal matrix embedded in normalized query–key interactions. Experiments on KITTI and NYU-v2 show that SVDA preserves competitive predictive performance while enabling intrinsic interpretability: on KITTI, AbsRel improves from 0.058 to 0.056 and

δ_{1}

from 0.976 to 0.979, while on NYU-v2, AbsRel improves from 0.133 to 0.124 and

δ_{1}

from 0.865 to 0.872. This is achieved with only 0.01% additional parameters, at the cost of a measurable runtime overhead associated with the added normalization and spectral modulation. More importantly, SVDA enables six spectral indicators that quantify entropy, rank, sparsity, alignment, selectivity, and robustness, revealing consistent cross-dataset and depth-wise patterns in how attention organizes during training. These properties make the model easier to inspect and better suited to applications where transparency and reliability are important, such as robotics and autonomous navigation. Full article

(This article belongs to the Special Issue Mathematics for Visual Computing: Acquisition, Processing, Analysis and Rendering of Visual Information)

28 pages, 3527 KB

Open AccessArticle

Autonomous Tomato Harvesting System Integrating AI-Controlled Robotics in Greenhouses

by Mihai Gabriel Matache, Florin Bogdan Marin, Catalin Ioan Persu, Robert Dorin Cristea, Florin Nenciu and Atanas Z. Atanasov

Agriculture 2026, 16(8), 847; https://doi.org/10.3390/agriculture16080847 (registering DOI) - 11 Apr 2026

Abstract

Labor shortages and the need for increased productivity have accelerated the development of robotic harvesting systems for greenhouse crops; however, reliable operation under fruit occlusion and clustered arrangements remains a major challenge, particularly due to the limited integration between perception and motion planning [...] Read more.

Labor shortages and the need for increased productivity have accelerated the development of robotic harvesting systems for greenhouse crops; however, reliable operation under fruit occlusion and clustered arrangements remains a major challenge, particularly due to the limited integration between perception and motion planning modules. The paper presents the design and experimental validation of an autonomous robotic system for greenhouse tomato harvesting. The proposed platform integrates a rail-guided mobile base, a six-degrees-of-freedom robotic manipulator, and an adaptive end effector with a hybrid vision framework that combines convolutional neural networks and watershed-based segmentation to enable robust fruit detection and localization under occluded conditions. The proposed approach enables improved separation of overlapping fruits and provides accurate spatial localization through stereo vision combined with IMU-assisted camera-to-robot coordinate transformation. An occlusion-aware trajectory planning strategy was developed to generate collision-free manipulation paths in the presence of leaves and stems, enhancing harvesting safety and reliability. The system was trained and evaluated using a dataset of real greenhouse images supplemented with synthetic data augmentation. Experimental trials conducted under practical greenhouse conditions demonstrated a fruit detection precision of 96.9%, recall of 93.5%, and mean Intersection-over-Union of 79.2%. The robotic platform achieved an overall harvesting success rate of 78.5%, reaching 85% for unobstructed fruits, with an average cycle time of 15 s per fruit in direct harvesting scenarios. The rail-guided mobility significantly improved positioning stability and repeatability during manipulation compared with fully mobile platforms. The results confirm that integrating hybrid perception with occlusion-aware motion planning can substantially improve the functionality of robotic harvesting systems in protected cultivation environments. The proposed solution contributes to the advancement of automation technologies for greenhouse vegetable production and supports the transition toward more sustainable and labor-efficient agricultural practices. Full article

(This article belongs to the Special Issue AI-Powered Agricultural Robots: From Field Sensing to Autonomous Operation)

► Show Figures

Figure 1

30 pages, 20938 KB

Open AccessReview

Remote Sensing of Water: The Observation-to-Inference Arc Across Six Decades and Toward an AI-Native Future

by Daniel P. Ames

Remote Sens. 2026, 18(8), 1127; https://doi.org/10.3390/rs18081127 - 10 Apr 2026

Abstract

Over six decades, satellite remote sensing of water resources has evolved from manual interpretation of weather photographs to AI systems that learn hydrologic predictions directly from satellite imagery. This review traces that evolution through the observation-to-inference arc—a framework for the progressively tightening coupling [...] Read more.

Over six decades, satellite remote sensing of water resources has evolved from manual interpretation of weather photographs to AI systems that learn hydrologic predictions directly from satellite imagery. This review traces that evolution through the observation-to-inference arc—a framework for the progressively tightening coupling between what satellites observe and what hydrologists infer. Using illustrative applications in precipitation, evapotranspiration, soil moisture, snow, surface water, and groundwater, we show how early observations (1960–1985) remained disconnected from operational hydrology; how calibrated retrieval algorithms (1985–2000) established a one-way pipeline from satellites to models; how operational infrastructure (2000–2015), anchored by MODIS, GRACE, GPM, and Sentinel, achieved assimilative coupling through computational feedback between models and observations; and how deep learning (2015–present) is beginning to collapse this pipeline. Multi-source data fusion has been a recurring enabler at each stage. We articulate a four-level AI vision and research trajectory, from AI-assisted interpretation through AI-native retrieval and AI-driven inference to autonomous Earth observation intelligence. Persistent challenges in mission continuity, calibration, equity of access, and translating satellite-derived information into operational water management decisions provide essential context for evaluating both the promise and limits of this trajectory. Full article

(This article belongs to the Special Issue Mapping the Blue: Remote Sensing in Water Resource Management)

23 pages, 5036 KB

Open AccessArticle

Distilling Vision Foundation Models into LiDAR Networks via Manifold-Aware Topological Alignment

by Yuchuan Yang and Xiaosu Xu

Computers 2026, 15(4), 234; https://doi.org/10.3390/computers15040234 - 9 Apr 2026

Abstract

LiDAR point cloud semantic segmentation is essential for autonomous driving, yet LiDAR-only methods remain constrained by sparsity and limited texture cues. We propose Cross-Modal Collaborative Manifold Distillation (CMCMD), which transfers open-world semantic priors from the DINOv3 Vision Foundation Model to a LiDAR student [...] Read more.

LiDAR point cloud semantic segmentation is essential for autonomous driving, yet LiDAR-only methods remain constrained by sparsity and limited texture cues. We propose Cross-Modal Collaborative Manifold Distillation (CMCMD), which transfers open-world semantic priors from the DINOv3 Vision Foundation Model to a LiDAR student network. The framework combines an Adaptive Relation Convolution (ARConv) backbone with geometry-conditioned aggregation, a Unified Bidirectional Mapping Module (UBMM) for explicit 2D–3D interaction, and Manifold-Aware Topological Distillation (MATD), which aligns inter-sample affinity structures in a shared latent manifold rather than enforcing pointwise feature matching. By preserving relational topology instead of absolute feature coordinates, CMCMD mitigates negative transfer across heterogeneous modalities. Experiments on SemanticKITTI and nuScenes yield mIoU values of 72.9% and 81.2%, respectively, surpassing the compared distillation baselines and approaching the performance of multimodal fusion methods at lower inference cost. Additional evaluation on real-world campus scenes further supports the cross-domain robustness of the proposed framework. Full article

► Show Figures

Figure 1

37 pages, 1309 KB

Open AccessSystematic Review

Black Sea Planktonic Organisms as Bioindicators for Biological Early Warning Systems: A Systematic Review

by Iuliia Baiandina, Aleksandr Grekov and Elena Vyshkvarkova

Water 2026, 18(8), 899; https://doi.org/10.3390/w18080899 - 9 Apr 2026

Abstract

This is the first systematic review evaluating Black Sea plankton as biosensor organisms for Biological Early Warning Systems (BEWS)—real-time monitoring approaches that detect sublethal behavioral or physiological responses to pollutants before irreversible ecosystem damage occurs. The systematic literature review was guided by the [...] Read more.

This is the first systematic review evaluating Black Sea plankton as biosensor organisms for Biological Early Warning Systems (BEWS)—real-time monitoring approaches that detect sublethal behavioral or physiological responses to pollutants before irreversible ecosystem damage occurs. The systematic literature review was guided by the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) approach, ensuring methodological transparency and applicability. A total of 140 publications from databases (Web of Science Core Collection, Scopus, PubMed, and Google Scholar databases) were included in the final analysis. We assess nine native planktonic taxa as candidates for automated video-based water quality monitoring, using a multi-criteria framework encompassing biological sensitivity, technical detectability, and practical feasibility. Three species emerge as the most suitable candidates: Aurelia aurita as a universal indicator (sensitive to copper, surfactants, petroleum, and microplastics; its large size enables standard video detection); Acartia tonsa for trace contamination (reproductive toxicity at metal concentrations 4–33× below regulatory standards); and Mnemiopsis leidyi for metal-specific discrimination (bioluminescent responses: 650% Zn, 430% Cu, and 350% Hg at 0.001 mg/L). Analysis of 140 publications reveals critical gaps: 33% of species lack toxicological data, 95% of studies test single toxicants despite natural mixture exposure, and microplastic methodology varies 1000-fold in particle size. Threshold analysis suggests planktonic sublethal stress at “safe” concentrations under current standards, suggesting inadequate protection of marine food webs. A complementary monitoring approach integrating these species with computer vision algorithms offers autonomous early-warning capability for Black Sea environmental management. Full article

(This article belongs to the Section Biodiversity and Functionality of Aquatic Ecosystems)

► Show Figures

Figure 1

30 pages, 14814 KB

Open AccessArticle

The Intelligent Row-Following Method and System for Corn Harvesters Driven by “Visual-Gateway” Collaboration

by Shengjie Zhou, Songling Du, Xinping Zhang, Cheng Yang, Guoying Li, Qingyang Wang and Liqing Zhao

Agriculture 2026, 16(8), 832; https://doi.org/10.3390/agriculture16080832 - 9 Apr 2026

Abstract

To address the issues of corn harvester field operations relying on driver visual guidance for row alignment, high labor intensity, and unstable operation accuracy, this study innovatively proposes a “vision-dominant, gateway-enhanced” dual-mode collaborative row-alignment assistance architecture, and independently develops the R²DC-Mask [...] Read more.

To address the issues of corn harvester field operations relying on driver visual guidance for row alignment, high labor intensity, and unstable operation accuracy, this study innovatively proposes a “vision-dominant, gateway-enhanced” dual-mode collaborative row-alignment assistance architecture, and independently develops the R²DC-Mask R-CNN instance segmentation network and MCC-KF robust filtering algorithm to form a deeply coupled hardware–software-assisted driving system. The R²DC-Mask R-CNN network is autonomously designed for corn row-detection scenarios, achieving accurate perception in complex field environments; the MCC-KF algorithm innovatively solves the state estimation divergence problem during transient vision failures through a multi-criteria constraint mechanism, ensuring continuous navigation capability; the intelligent gateway and vision system form a confidence-driven master–slave switching mechanism that adaptively enhances system robustness when vision is restricted. Field experiments demonstrate that within the speed range of 0.5–5.0 km/h, the average lateral deviation in the row alignment assisted by the system is 3.82–5.30 cm, the proportion of deviations less than 10 cm exceeds 96%, and all sample deviations remain within 20 cm; at a speed of 3.5 km/h, the system reduces the average grain loss rate from 3.76% under manual operation to 2.65%, a decrease of 29.5%. This system effectively improves row alignment accuracy and harvest quality, providing a practical human–machine collaborative solution for intelligent harvester operations. Full article

(This article belongs to the Section Agricultural Technology)

► Show Figures

Figure 1

14 pages, 871 KB

Open AccessArticle

Validation of a Dermatology-Focused Multimodal Image-and-Data Assistant in Diagnosis and Management of Common Dermatologic Conditions

by Joshua Mijares, Emma J. Bisch, Eanna DeGuzman, Kanika Garg, David Pontes, Neil K. Jairath, Vignesh Ramachandran, George Jeha, Andjela Nemcevic and Syril Keena T. Que

Medicina 2026, 62(4), 715; https://doi.org/10.3390/medicina62040715 - 9 Apr 2026

Abstract

Background and Objectives: Shortages of dermatologists create significant barriers to care, particularly for inflammatory and history-dependent conditions where image-only artificial intelligence (AI) classifiers have limited applicability. Current teledermatology solutions largely focus on single-task, morphology-based neoplasm classifiers, leaving the vast majority of dermatologic [...] Read more.

Background and Objectives: Shortages of dermatologists create significant barriers to care, particularly for inflammatory and history-dependent conditions where image-only artificial intelligence (AI) classifiers have limited applicability. Current teledermatology solutions largely focus on single-task, morphology-based neoplasm classifiers, leaving the vast majority of dermatologic presentations underserved. This study evaluated the diagnostic accuracy and management plan quality of Dermflow (Prava Medical, Delaware, USA), a proprietary dermatology-focused Multimodal Image-and-Data Assistant (MIDA) that autonomously gathers dermatology-specific history, integrates data with patient-submitted images, and outputs structured differential diagnoses and management summaries. Materials and Methods: Two AI systems, Dermflow and Claude Sonnet 4 (Claude, a leading vision–language model), analyzed 87 clinical images from the Skin Condition Image Network and Diverse Dermatology Images databases, representing 10 inflammatory dermatoses and 9 neoplastic conditions stratified across Fitzpatrick Skin Tone (FST) categories (I–II, III–IV, V–VI). For the diagnostic comparison, Dermflow received images and autonomously gathered clinical history, while Claude received identical images without history. For the management plan comparison, both systems received the correct diagnosis and the clinical histories gathered by Dermflow. The primary outcome was diagnostic accuracy. The secondary outcome was management plan quality, assessed by two blinded dermatologists across eight clinical dimensions using 5-point Likert scales. Chi-square tests compared diagnostic accuracy between models; t-tests and ANOVA compared management quality scores. Results: Dermflow achieved markedly superior diagnostic accuracy compared to Claude (86.2% vs. 24.1%, p < 0.001). Both models maintained consistent diagnostic performance across FST categories without significant within-model differences (Dermflow p = 0.924; Claude p = 0.828). Management plan quality showed no significant overall differences between models. However, composite management quality scores declined significantly for darker skin tones across both systems: Dermflow scored 4.20 (FST I–II), 3.99 (FST III–IV), and 3.47 (FST V–VI); Claude scored 4.35, 3.97, and 3.44, respectively (p < 0.001 for most pairwise FST comparisons within each model). Conclusions: Multimodal AI integrating targeted history with image analysis achieves substantially higher diagnostic accuracy than image-only approaches across both inflammatory and neoplastic dermatologic conditions. Autonomous history gathering addresses fundamental limitations of morphology-only classifiers and enables scalable, patient-facing triage across the full spectrum of dermatologic disease. However, both models demonstrated reduced management plan quality for darker skin tones despite receiving the correct diagnosis, suggesting persistent training data limitations that require targeted bias-mitigation strategies beyond domain-specific instruction. Full article

(This article belongs to the Special Issue Recent Advances in Diagnosis and Therapy of Inflammatory Skin Diseases)

► Show Figures

Figure 1

32 pages, 41104 KB

Open AccessArticle

SCEW-YOLOv8 Detection Model and Camera-LiDAR Fusion Positioning System for Whole-Growth-Cycle Management of Cabbage

by Jiangyi Han, Deyuan Lyu and Changgao Xia

Appl. Sci. 2026, 16(7), 3510; https://doi.org/10.3390/app16073510 - 3 Apr 2026

Viewed by 162

Abstract

High-precision identification and three-dimensional (3D) positioning of cabbage plants across their entire growth cycle are fundamental prerequisites for automated agricultural management. To overcome field challenges like extreme morphological variations, severe leaf occlusion, and bounding box jitter, we introduce a camera-LiDAR fusion perception system. [...] Read more.

High-precision identification and three-dimensional (3D) positioning of cabbage plants across their entire growth cycle are fundamental prerequisites for automated agricultural management. To overcome field challenges like extreme morphological variations, severe leaf occlusion, and bounding box jitter, we introduce a camera-LiDAR fusion perception system. First, an advanced SCEW-YOLOv8 architecture is proposed, sequentially integrating SPD-Conv downsampling, a C2f-CX global feature enhancement module, an EMA cross-space attention mechanism, and the WIoU v3 loss function. Evaluated on a comprehensive whole-growth-cycle cabbage dataset, the model achieves 95.8% mAP@0.5 and 90.8% recall with a real-time inference speed of 64.2 FPS. Furthermore, a visual semantic-driven camera-LiDAR fusion ranging algorithm is developed. Through rigorous spatiotemporal synchronization and cascaded outlier filtering, the integrated system achieves millimeter-level 3D localization within the typical 1.0–2.0 m operating range of agricultural robots. It maintains a Mean Absolute Error (MAE) of only 1.45 mm in the longitudinal direction at a stable processing throughput of 20 FPS. Compared to traditional pure vision depth estimation, this heterogeneous fusion approach achieves a remarkable 96.3% reduction in spatial positioning error at extended distances, fundamentally eliminating depth degradation caused by complex illumination. Ultimately, this system provides a highly robust, full-cycle geometric perception framework for the autonomous management of open-field green cabbage. Full article

(This article belongs to the Section Agricultural Science and Technology)

► Show Figures

Figure 1

28 pages, 5422 KB

Open AccessArticle

Vision-Guided Dual-Loop Control of a Truck-Mounted Electric Water Cannon for Autonomous Fire Suppression

by Zhiyuan Chen and Chaofeng Liu

Appl. Sci. 2026, 16(7), 3469; https://doi.org/10.3390/app16073469 - 2 Apr 2026

Viewed by 207

Abstract

Fire trucks equipped with truck-mounted electric water cannons are key mobile firefighting assets for urban and industrial fire response. However, due to the inherent mechanical inertia of the cannon body, its low-frequency motion response cannot match high-frequency control commands, making the system prone [...] Read more.

Fire trucks equipped with truck-mounted electric water cannons are key mobile firefighting assets for urban and industrial fire response. However, due to the inherent mechanical inertia of the cannon body, its low-frequency motion response cannot match high-frequency control commands, making the system prone to oscillations and control instability. To address this command–execution frequency mismatch, this paper proposes a decoupled dual closed-loop control architecture for truck-mounted electric water cannons on mobile fire trucks: the fast loop is used for fire-source tracking and rapid localization, while the slow loop is used for water-jet aiming alignment. In the fast loop, a 2-D quadrant positioning rule drives the pan–tilt unit to achieve rapid fire tracking and accurate centering. In the slow loop, Kalman-filter-based state estimation and delay-aligned prediction generate feedforward aiming commands; these commands are fused with error feedback and further processed through command limiting and trajectory optimization, ultimately producing smooth and executable angle references. The visual perception module ran at 58 FPS, satisfying the real-time requirement of the proposed system. In five repeated extinguishment tests under controlled open-site conditions, the proposed method successfully completed all trials and reduced the mean extinguishment time to 13.55 s, compared with 15.83 s for the incremental-PID baseline and 23.76 s for the coupled proportional baseline, while also showing smoother correction and less redundant oscillation. Full article

(This article belongs to the Section Mechanical Engineering)

► Show Figures

Figure 1

16 pages, 1529 KB

Open AccessArticle

Image Segmentation-Guided Visual Tracking on a Bio-Inspired Quadruped Robot

by Hewen Xiao, Guangfu Ma and Weiren Wu

Biomimetics 2026, 11(4), 234; https://doi.org/10.3390/biomimetics11040234 - 2 Apr 2026

Viewed by 272

Abstract

Bio-inspired quadrupedal robots exhibit superior adaptability and mobility in unstructured environments, making them suitable for complex task scenarios such as navigation, obstacle avoidance, and tracking in a variety of environments. Visual perception plays a critical role in enabling autonomous behavior, offering a cost-effective [...] Read more.

Bio-inspired quadrupedal robots exhibit superior adaptability and mobility in unstructured environments, making them suitable for complex task scenarios such as navigation, obstacle avoidance, and tracking in a variety of environments. Visual perception plays a critical role in enabling autonomous behavior, offering a cost-effective alternative to multi-sensor systems. This paper proposes an image segmentation-guided visual tracking framework to enhance both perception and motion control in quadruped robots. On the perception side, a cascaded convolutional neural network is introduced, integrating a global information guidance module to fuse low-level textures and high-level semantic features. This architecture effectively addresses limitations in single-scale feature extraction and improves segmentation accuracy under visually degraded conditions. On the control side, segmentation outputs are embedded into a biologically inspired central pattern generator (CPG), enabling coordinated generation of limb and spinal trajectories. This integration facilitates a closed-loop visual-motor system that adapts dynamically to environmental changes. Experimental evaluations on benchmark image segmentation datasets and robotic locomotion tasks demonstrate that the proposed framework achieves enhanced segmentation precision and motion flexibility, outperforming existing methods. The results highlight the effectiveness of vision-guided control strategies and their potential for deployment in real-time robotic navigation. Full article

(This article belongs to the Special Issue Theory and Application of Bioinspired Robotics and Intelligent Control)

► Show Figures

Figure 1

21 pages, 2891 KB

Open AccessArticle

Energy Emissions and Cost Impacts of Autonomous Battery Electric Vehicles in Riyadh

by Ali Louati, Hassen Louati and Elham Kariri

Batteries 2026, 12(4), 125; https://doi.org/10.3390/batteries12040125 - 1 Apr 2026

Viewed by 232

Abstract

Autonomous battery electric vehicles (BEVs) have the potential to reshape urban mobility systems, yet their sustainability impacts remain underexplored in Gulf-region cities where traffic dynamics, land-use structures, and environmental conditions differ substantially from Western contexts. This study introduces a Saudi-specific assessment framework that [...] Read more.

Autonomous battery electric vehicles (BEVs) have the potential to reshape urban mobility systems, yet their sustainability impacts remain underexplored in Gulf-region cities where traffic dynamics, land-use structures, and environmental conditions differ substantially from Western contexts. This study introduces a Saudi-specific assessment framework that integrates monetised externalities with empirically calibrated traffic dynamics to evaluate how automation influences safety, congestion, land use, emissions, and noise. To the best of our knowledge, this is the first Riyadh-calibrated monetised external-cost evaluation of autonomous BEVs that couples externality valuation with simulation-validated time-varying traffic dynamics (SAR per vkm and SAR per pkm), enabling realistic peak-period sustainability assessment. The framework’s key contribution is linking external-cost modelling with spatiotemporal traffic behaviour derived from Riyadh’s 2023 mobility patterns, providing a more realistic basis for sustainability evaluation. Using national datasets from transport, energy, and statistical authorities, the model estimates substantial reductions in external costs when transitioning from human-driven to autonomous BEVs, driven primarily by lower crash exposure and smoother traffic flow. To validate these findings under real operating conditions, a dynamic analysis incorporating hourly and seasonal traffic variability was developed, revealing that automation delivers its strongest improvements during peak-demand periods where congestion externalities are highest. The integrated results demonstrate the relevance of autonomous BEVs for dense rapidly growing Saudi cities and provide actionable insights for future mobility planning. The study highlights the policy importance of coordinated transport, land-use, and energy strategies to ensure that automation contributes meaningfully to national sustainability goals under Vision 2030. Full article

(This article belongs to the Section Battery Modelling, Simulation, Management and Application)

► Show Figures

Figure 1

33 pages, 16801 KB

Open AccessArticle

A GNSS–Vision Integrated Autonomous Navigation System for Trellis Orchard Transportation Robots

by Huaiyang Liu, Haiyang Gu, Yong Wang, Tianjiao Zhong, Tong Tian and Changxing Geng

AI 2026, 7(4), 125; https://doi.org/10.3390/ai7040125 - 1 Apr 2026

Viewed by 319

Abstract

Autonomous navigation is essential for orchard transportation robots to support automated operations and precision orchard management. However, in trellis orchards, dense vegetation and complex canopy structures often degrade the stability of GNSS-based navigation in in-row environments. To address this issue, this study proposes [...] Read more.

Autonomous navigation is essential for orchard transportation robots to support automated operations and precision orchard management. However, in trellis orchards, dense vegetation and complex canopy structures often degrade the stability of GNSS-based navigation in in-row environments. To address this issue, this study proposes a GNSS–vision integrated navigation framework for orchard transportation robots. The performance of GNSS-based navigation in out-of-row environments and vision-based navigation in in-row environments was experimentally evaluated under representative orchard operating conditions. In out-of-row areas, the robot employs GNSS-based path planning and trajectory tracking to achieve reliable navigation in relatively open, lightly occluded environments. During in-row navigation, a deep learning-based real-time object detection approach is used to detect tree trunks and trellis supporting structures. By integrating corner-point selection with temporal RANSAC-based line fitting, a stable orchard row structure is constructed to generate robust navigation references. The visual perception module serves as the front-end sensing component of the navigation system and is designed to be independent of specific object detection architectures, allowing flexible integration with different real-time detection models. Field experiments were conducted under various orchard layouts and growth stages. The average lateral deviation of GNSS-based navigation in out-of-row scenarios ranged from 0.093 to 0.221 m, while the average heading deviation of in-row visual navigation was approximately 5.23° at a robot speed of 0.6 m/s. These results indicate that the proposed perception and navigation methods can maintain stable navigation performance within their respective applicable scenarios in trellis orchard environments. The experimental findings provide a practical and engineering-oriented basis for future research on automatic navigation mode switching and system-level integration of orchard transportation robots. Full article

► Show Figures

Figure 1

12 pages, 2073 KB

Open AccessProceeding Paper

Binocular Stereo Vision Disparity Estimation Based on Distilled Internally Normalized Optimized Version 2 with Multi-Scale Attention Fusion

by Chang-Fu Hung, Tzu-Jung Tseng and Jian-Jiun Ding

Eng. Proc. 2026, 134(1), 20; https://doi.org/10.3390/engproc2026134020 - 31 Mar 2026

Viewed by 189

Abstract

A stereo vision framework is designed to improve disparity estimation in occluded and boundary regions, targeting autonomous driving scenarios. The proposed architecture combines frozen Distilled Internally Normalized Optimized Version 2 features with a modular three-stage attention fusion strategy, which consists of bottom-up semantic [...] Read more.

A stereo vision framework is designed to improve disparity estimation in occluded and boundary regions, targeting autonomous driving scenarios. The proposed architecture combines frozen Distilled Internally Normalized Optimized Version 2 features with a modular three-stage attention fusion strategy, which consists of bottom-up semantic propagation, top-down detail enhancement, and cross-view attention mechanisms. These stages jointly enforce semantic consistency, structural integrity, and accurate correspondence modeling. The fused features are then processed by an Iterative Geometry Encoding and Volumetric regression-based disparity estimation module for multi-stage regression and iterative refinement. A three-phase training pipeline is employed, including pretraining on SceneFlow, fine-tuning on virtual Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) benchmarks, and adaptation to the KITTI and ETH Zurich 3D benchmark dataset. The model achieves an out-of-center, non-occluded pixel error of 7.45% on KITTI2012 and a D1-all error of 4.10% on KITTI2015. Beyond quantitative performance, the proposed method produces visually superior disparity maps. The enhancements of boundary sharpness, occlusion completion, and structural coherence demonstrate the strong potential of the proposed algorithm for real-world deployment in dynamic and complex environments. Full article

(This article belongs to the Proceedings of The 7th Eurasia Conference on IoT, Communication and Engineering 2025 (ECICE 2025))

► Show Figures

Figure 1

31 pages, 7864 KB

Open AccessArticle

Development of a General-Purpose AI-Powered Robotic Platform for Strawberry Harvesting

by Muhammad Tufail, Jamshed Iqbal and Rafiq Ahmad

Agriculture 2026, 16(7), 769; https://doi.org/10.3390/agriculture16070769 - 31 Mar 2026

Viewed by 358

Abstract

The integration of emerging technologies such as robotics and artificial intelligence (AI) has the potential to transform agricultural harvesting by improving efficiency, reducing waste, lowering labor dependency, and enhancing produce quality. This paper presents the development of an intelligent robotic berry harvesting system [...] Read more.

The integration of emerging technologies such as robotics and artificial intelligence (AI) has the potential to transform agricultural harvesting by improving efficiency, reducing waste, lowering labor dependency, and enhancing produce quality. This paper presents the development of an intelligent robotic berry harvesting system that combines deep learning–based perception with autonomous robotic manipulation for real-time strawberry harvesting. A computer vision pipeline based on the YOLOv11 segmentation model was developed and integrated into a Smart Mobile Manipulator (SMM) equipped with autonomous navigation, a 6-degree-of-freedom (6-DoF) xArm 6 robotic arm, and ROS middleware to enable real-time operation. Using a publicly available strawberry dataset comprising 2,800 images collected under ridge-planted cultivation conditions, the proposed YOLOv11-small segmentation model achieved 84.41% mAP@0.5, outperforming YOLOv11 object detection, Faster R-CNN, and RT-DETR in segmentation quality while maintaining real-time performance at 10 FPS on an NVIDIA Jetson Orin Nano edge GPU. A PCA-based fruit orientation and geometric analysis method achieved 86.5% localization accuracy on 200 test images. Controlled indoor harvesting experiments using synthetic strawberries demonstrated an overall harvesting success rate of 72% across 50 trials. The proposed system provides a general-purpose platform for berry harvesting in controlled environments, offering a scalable and efficient solution for autonomous harvesting. Full article

(This article belongs to the Special Issue Advances in Robotic Systems for Precision Orchard Operations)

► Show Figures

Figure 1

34 pages, 24153 KB

Open AccessArticle

Forest Vegetation 3D Localization Using Deep Learning Object Detectors

by Paulo A. S. Mendes, António P. Coimbra and Aníbal T. de Almeida

Appl. Sci. 2026, 16(7), 3375; https://doi.org/10.3390/app16073375 - 31 Mar 2026

Viewed by 163

Abstract

Forest fires are becoming increasingly prevalent and destructive in many regions of the world, posing significant threats to biodiversity, ecosystems, human settlements, climate, and the economy. The United States of America (USA), Australia, Canada, Greece and Portugal are five regions that have experienced [...] Read more.

Forest fires are becoming increasingly prevalent and destructive in many regions of the world, posing significant threats to biodiversity, ecosystems, human settlements, climate, and the economy. The United States of America (USA), Australia, Canada, Greece and Portugal are five regions that have experienced enormous forest fires. One way to reduce the size and rage of forest fires is by decreasing the amount of flammable material in forests. This can be achieved using autonomous Unmanned Ground Vehicles (UGVs) specialized in vegetation cutting and equipped with Artificial Intelligence (AI) algorithms to identify and differentiate between vegetation that should be preserved and material that should be removed as potential fire fuel. In this paper, an innovative study of forest vegetation detection, classification and 3D localization using ground vehicles’ RGB and depth images is presented to support autonomous forest cleaning operations to prevent fires. The presented work, which is a continuation of a previous research, presents a method for 3D objects localization in the real-world using Deep Learning Object Detection (DLOD) combined with an RGB-D camera. It presents and compares results of eight recent high-performance DLOD architectures, YOLOv5, YOLOv7, YOLOv8, YOLO-NAS, YOLOv9, YOLOv10, YOLO11 and YOLOv12, to detect and classify forest vegetation in five classes: “Grass”, “Live vegetation”, “Cut vegetation”, “Dead vegetation”, and “Tree-trunk”. For the training of the DLOD models, our custom dataset acquired in dense forests in Portugal is used. A methodology that combines the best DLOD trained for vegetation detection and classification and an RGB-D camera for the 3D localization of the classified detected objects in the real-world. The presented methods are employed in an Unmanned Ground Vehicle (UGV) to localize forest vegetation that needs to be thinned for fire prevention purposes. A key challenge for autonomous forest vegetation cleaning is the reliable discrimination of objects that need to be identified to reach the goal of fire prevention using autonomous unmanned ground vehicles in dense forests. With the obtained results, forest vegetation is precisely detected, classified and localized using the DL models and the localization method presented. Also, the fastest DLOD architecture to train is YOLOv5, and the fastest to infer are YOLOv7 and YOLOv12. The innovation presented is the detection, classification, and 3D localization of the vegetation using DLOD architectures, in real-time, with a localization error of the real-world object in width, height and depth under 21.4, 20.7 and 11%, respectively, using only a depth camera and a processing unit. The 3D localized objects are defined as parallelepiped geometrical shapes. The methodology for vegetation detection, classification and localization presented in this paper is highly suitable for future autonomous forest vegetation cleansing, specialized using unmanned ground vehicles. Full article

► Show Figures

Figure 1

Search Results (1,302)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,302)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI