MDPI - Publisher of Open Access Journals

21 pages, 15478 KiB

Open AccessReview

Small Object Detection in Traffic Scenes for Mobile Robots: Challenges, Strategies, and Future Directions

by Zhe Wei, Yurong Zou, Haibo Xu and Sen Wang

Electronics 2025, 14(13), 2614; https://doi.org/10.3390/electronics14132614 - 28 Jun 2025

Viewed by 563

Small object detection in traffic scenes presents unique challenges for mobile robots operating under constrained computational resources and highly dynamic environments. Unlike general object detection, small targets often suffer from low resolution, weak semantic cues, and frequent occlusion, especially in complex outdoor scenarios. [...] Read more.

Small object detection in traffic scenes presents unique challenges for mobile robots operating under constrained computational resources and highly dynamic environments. Unlike general object detection, small targets often suffer from low resolution, weak semantic cues, and frequent occlusion, especially in complex outdoor scenarios. This study systematically analyses the challenges, technical advances, and deployment strategies for small object detection tailored to mobile robotic platforms. We categorise existing approaches into three main strategies: feature enhancement (e.g., multi-scale fusion, attention mechanisms), network architecture optimisation (e.g., lightweight backbones, anchor-free heads), and data-driven techniques (e.g., augmentation, simulation, transfer learning). Furthermore, we examine deployment techniques on embedded devices such as Jetson Nano and Raspberry Pi, and we highlight multi-modal sensor fusion using Light Detection and Ranging (LiDAR), cameras, and Inertial Measurement Units (IMUs) for enhanced environmental perception. A comparative study of public datasets and evaluation metrics is provided to identify current limitations in real-world benchmarking. Finally, we discuss future directions, including robust detection under extreme conditions and human-in-the-loop incremental learning frameworks. This research aims to offer a comprehensive technical reference for researchers and practitioners developing small object detection systems for real-world robotic applications. Full article

(This article belongs to the Special Issue New Trends in Computer Vision and Image Processing)

► Show Figures

Figure 1

18 pages, 3132 KiB

Open AccessArticle

ICAFormer: An Image Dehazing Transformer Based on Interactive Channel Attention

by Yanfei Chen, Tong Yue, Pei An, Hanyu Hong, Tao Liu, Yangkai Liu and Yihui Zhou

Sensors 2025, 25(12), 3750; https://doi.org/10.3390/s25123750 - 15 Jun 2025

Cited by 1 | Viewed by 614

Abstract

Single image dehazing is a fundamental task in computer vision, aiming to recover a clear scene from a hazy input image. To address the limitations of traditional dehazing algorithms—particularly in global feature association and local detail preservation—this study proposes a novel Transformer-based dehazing [...] Read more.

Single image dehazing is a fundamental task in computer vision, aiming to recover a clear scene from a hazy input image. To address the limitations of traditional dehazing algorithms—particularly in global feature association and local detail preservation—this study proposes a novel Transformer-based dehazing model enhanced by an interactive channel attention mechanism. The proposed architecture adopts a U-shaped encoder–decoder framework, incorporating key components such as a feature extraction module and a feature fusion module based on interactive attention. Specifically, the interactive channel attention mechanism facilitates cross-layer feature interaction, enabling the dynamic fusion of global contextual information and local texture details. The network architecture leverages a multi-scale feature pyramid to extract image information across different dimensions, while an improved cross-channel attention weighting mechanism enhances feature representation in regions with varying haze densities. Extensive experiments conducted on both synthetic and real-world datasets—including the RESIDE benchmark—demonstrate the superior performance of the proposed method. Quantitatively, it achieves PSNR gains of 0.53 dB for indoor scenes and 1.64 dB for outdoor scenes, alongside SSIM improvements of 1.4% and 1.7%, respectively, compared with the second-best performing method. Qualitative assessments further confirm that the proposed model excels in restoring fine structural details in dense haze regions while maintaining high color fidelity. These results validate the effectiveness of the proposed approach in enhancing both perceptual quality and quantitative accuracy in image dehazing tasks. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

15 pages, 3167 KiB

Open AccessArticle

Building a Realistic Virtual Luge Experience Using Photogrammetry

by Bernhard Hollaus, Jonas Kreiner, Maximilian Gallinat, Meggy Hayotte and Denny Yu

Sensors 2025, 25(8), 2568; https://doi.org/10.3390/s25082568 - 18 Apr 2025

Viewed by 505

Abstract

Virtual reality (VR) continues to evolve, offering immersive experiences across various domains, especially in virtual training scenarios. The aim of this study is to present the development of a VR simulator and to examine its realism, usability, and acceptance by luge experts after [...] Read more.

Virtual reality (VR) continues to evolve, offering immersive experiences across various domains, especially in virtual training scenarios. The aim of this study is to present the development of a VR simulator and to examine its realism, usability, and acceptance by luge experts after an experiment with a VR simulation. We present a novel photogrammetry sensing to VR pipeline for the sport of luge designed with the goal to be as close to the real luge experience as possible, potentially enabling users to learn critical techniques safely prior to real-world trials. Key features of our application include realistic terrain created with photogrammetry and responsive sled dynamics. A consultation of experts from the Austrian Luge Federation led to several design improvements to the VR environment, especially based on user experience aspects such as lifelike feedback and interface responsiveness. Furthermore, user interaction was optimized to enable precise steering and maneuvering. Moreover, two learning modes were developed to accommodate user experience levels (novice and expert). The results indicated a good level of realism of the VR luge simulator. Participants reported scene, audience behavior, and sound realism scores that ranged from 3/5 to 4/5. Our findings indicated adequate usability (system usability score: 72.7, SD = 13.9). Moderate scores were observed for the acceptance of VRodel. In conclusion, our virtual luge application offers a promising avenue for exploring the potential of VR technology in delivering authentic outdoor recreation experiences that could increase safety in the sport of luge. By integrating advanced sensing, simulations, and interactive features, we aim to push the boundaries of realism in virtual lugeing and pave the way for future advancements in immersive entertainment and simulation applications. Full article

(This article belongs to the Special Issue Sensors and Techniques for Virtual Reality, Augmented Reality and Mixed Reality Applications)

► Show Figures

Figure 1

32 pages, 8687 KiB

Open AccessArticle

Hybrid Deep Learning Methods for Human Activity Recognition and Localization in Outdoor Environments

by Yirga Yayeh Munaye, Metadel Addis, Yenework Belayneh, Atinkut Molla and Wasyihun Admass

Algorithms 2025, 18(4), 235; https://doi.org/10.3390/a18040235 - 18 Apr 2025

Viewed by 851

Abstract

Activity recognition and localization in outdoor environments involve identifying and tracking human movements using sensor data, computer vision, or deep learning techniques. This process is crucial for applications such as smart surveillance, autonomous systems, healthcare monitoring, and human–computer interaction. However, several challenges arise [...] Read more.

Activity recognition and localization in outdoor environments involve identifying and tracking human movements using sensor data, computer vision, or deep learning techniques. This process is crucial for applications such as smart surveillance, autonomous systems, healthcare monitoring, and human–computer interaction. However, several challenges arise in outdoor settings, including varying lighting conditions, occlusions caused by obstacles, environmental noise, and the complexity of differentiating between similar activities. This study presents a hybrid deep learning approach that integrates human activity recognition and localization in outdoor environments using Wi-Fi signal data. The study focuses on applying the hybrid long short-term memory–bi-gated recurrent unit (LSTM-BIGRU) architecture, designed to enhance the accuracy of activity recognition and location estimation. Moreover, experiments were conducted using a real-world dataset collected with the PicoScene Wi-Fi sensing device, which captures both magnitude and phase information. The results demonstrated a significant improvement in accuracy for both activity recognition and localization tasks. To mitigate data scarcity, this study utilized the conditional tabular generative adversarial network (CTGAN) to generate synthetic channel state information (CSI) data. Additionally, carrier frequency offset (CFO) and cyclic shift delay (CSD) preprocessing techniques were implemented to mitigate phase fluctuations. The experiments were conducted in a line-of-sight (LoS) outdoor environment, where CSI data were collected using the PicoScene Wi-Fi sensor platform across four different activities at outdoor locations. Finally, a comparative analysis of the experimental results highlights the superior performance of the proposed hybrid LSTM-BIGRU model, achieving 99.81% and 98.93% accuracy for activity recognition and location prediction, respectively. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

► Show Figures

Figure 1

19 pages, 15020 KiB

Open AccessArticle

Discrete Diffusion-Based Generative Semantic Scene Completion

by Yiqi Wu, Xuan Huang, Boxiong Yang, Yong Chen, Fadi Aburaid and Dejun Zhang

Electronics 2025, 14(7), 1447; https://doi.org/10.3390/electronics14071447 - 3 Apr 2025

Viewed by 530

Abstract

Semantic scene completion through AI-driven content generation is a rapidly evolving field with crucial applications in 3D reconstruction and scene understanding. This task presents considerable challenges, arising from the intrinsic data sparsity and incomplete nature of input points generated by LiDAR. This paper [...] Read more.

Semantic scene completion through AI-driven content generation is a rapidly evolving field with crucial applications in 3D reconstruction and scene understanding. This task presents considerable challenges, arising from the intrinsic data sparsity and incomplete nature of input points generated by LiDAR. This paper proposes a generative semantic scene completion method based on a discrete denoising diffusion probabilistic model to tackle these issues. In the discrete diffusion phase, a weighted K-nearest neighbor uniform transition kernel is introduced based on feature distance in the discretized voxel space to control the category distribution transition processes by capturing the local structure of data, which is more in line with the diffusion process in the real world. Moreover, to mitigate the feature information loss during point cloud voxelization, the aggregated point features are integrated into the corresponding voxel space, thereby enhancing the granularity of the completion. Accordingly, a combined loss function is designed for network training that considers both the KL divergence for global completion and the cross-entropy for local details. The evaluation, which results from multiple public outdoor datasets, demonstrates that the proposed method effectively accomplishes semantic scene completion. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

15 pages, 3998 KiB

Open AccessArticle

Large Bias in Matching Small Horizontal and Vertical Extents Separated in Depth in the Real World Is Similar for Upright and Supine Observers

by Frank H. Durgin, Chung Sze Kwok, Katelyn M. Becker and Ya Min Phyu

Vision 2025, 9(1), 11; https://doi.org/10.3390/vision9010011 - 3 Feb 2025

Viewed by 866

Abstract

The apparent sizes of horizontal and vertical lines show an anisotropy known as the horizontal vertical illusion (HVI) wherein vertical lines appear to be longer than their horizontal counterparts. Whereas a typical HVI comparing vertical and horizontal lines in a plane produces a [...] Read more.

The apparent sizes of horizontal and vertical lines show an anisotropy known as the horizontal vertical illusion (HVI) wherein vertical lines appear to be longer than their horizontal counterparts. Whereas a typical HVI comparing vertical and horizontal lines in a plane produces a 5–10% illusion, a much larger-scale illusion (15–25%) is often found for large objects in the real world, and this has been related to differential angular exaggerations in perceived elevation (vertical) and azimuthal (horizontal) direction. Recently supine observers in virtual environments were found to show larger exaggerations in perceived azimuth than upright observers. Here, 48 participants were tested in both supine and upright postures in an outdoor environment while matching fairly small physical extents in the real world. They adjusted the magnitude of the horizontal extent to perceptually match fairly small vertical poles (0.7–1.3 m tall) that were either presented at the same viewing distance as the matching extent or in a different depth plane, so that size at a distance had to be compared. Supine observers viewed the scene, as though upright, through a large mirror mounted overhead at 45° that was adjusted to approximate their normal eye height. When the matcher extent was at a different distance than the pole, horizontal extent matches typically exceeded the actual pole height by about 15% or more, whether the viewer was upright or supine. The average overestimation was only about 10% when the matching extent was at the same distance. Despite the similarity in performance across different postures for spatial matching, supine observers gave much higher explicit estimates of azimuthal direction than upright observers. However, although the observation of exaggeration in perceived azimuth for supine observers was replicated in a second study with 24 additional participants using a mirror with a smaller (more normal) aspect ratio, the magnitude of the exaggeration seemed to be greatly reduced when the field of view of the apparatus had a more typical aspect ratio. This suggests that the unusually large exaggeration of azimuth found in a previous report with supine observers may have been caused by the unusually large aspect ratio of the viewing apparatus used. Full article

► Show Figures

Figure 1

18 pages, 36094 KiB

Open AccessArticle

Arbitrary Optics for Gaussian Splatting Using Space Warping

by Jakob Nazarenus, Simin Kou, Fang-Lue Zhang and Reinhard Koch

J. Imaging 2024, 10(12), 330; https://doi.org/10.3390/jimaging10120330 - 22 Dec 2024

Viewed by 1714

Abstract

Due to recent advances in 3D reconstruction from RGB images, it is now possible to create photorealistic representations of real-world scenes that only require minutes to be reconstructed and can be rendered in real time. In particular, 3D Gaussian splatting shows promising results, [...] Read more.

Due to recent advances in 3D reconstruction from RGB images, it is now possible to create photorealistic representations of real-world scenes that only require minutes to be reconstructed and can be rendered in real time. In particular, 3D Gaussian splatting shows promising results, outperforming preceding reconstruction methods while simultaneously reducing the overall computational requirements. The main success of 3D Gaussian splatting relies on the efficient use of a differentiable rasterizer to render the Gaussian scene representation. One major drawback of this method is its underlying pinhole camera model. In this paper, we propose an extension of the existing method that removes this constraint and enables scene reconstructions using arbitrary camera optics such as highly distorting fisheye lenses. Our method achieves this by applying a differentiable warping function to the Gaussian scene representation. Additionally, we reduce overfitting in outdoor scenes by utilizing a learnable skybox, reducing the presence of floating artifacts within the reconstructed scene. Based on synthetic and real-world image datasets, we show that our method is capable of creating an accurate scene reconstruction from highly distorted images and rendering photorealistic images from such reconstructions. Full article

(This article belongs to the Special Issue Geometry Reconstruction from Images (2nd Edition))

► Show Figures

Figure 1

24 pages, 2450 KiB

Open AccessArticle

Progressive Pruning of Light Dehaze Networks for Static Scenes

by Byeongseon Park, Heekwon Lee, Yong-Kab Kim and Sungkwan Youm

Appl. Sci. 2024, 14(23), 10820; https://doi.org/10.3390/app142310820 - 22 Nov 2024

Cited by 1 | Viewed by 962

Abstract

This paper introduces an progressive pruning method for Light DeHaze Networks, focusing on a static scene captured by a fixed camera environments. We develop a progressive pruning algorithm that aims to reduce computational complexity while maintaining dehazing quality within a specified threshold. Our [...] Read more.

This paper introduces an progressive pruning method for Light DeHaze Networks, focusing on a static scene captured by a fixed camera environments. We develop a progressive pruning algorithm that aims to reduce computational complexity while maintaining dehazing quality within a specified threshold. Our key contributions include a fine-tuning strategy for specific scenes, channel importance analysis, and an progressive pruning approach considering layer-wise sensitivity. Our experiments demonstrate the effectiveness of our progressive pruning method. Our progressive pruning algorithm, targeting a specific PSNR(Peak Signal-to-Noise Ratio) threshold, achieved optimal results at a certain pruning ratio, significantly reducing the number of channels in the target layer while maintaining PSNR above the threshold and preserving good structural similarity, before automatically stopping when performance dropped below the target. This demonstrates the algorithm’s ability to find an optimal balance between model compression and performance maintenance. This research enables efficient deployment of high-quality dehazing algorithms in resource-constrained environments, applicable to traffic monitoring and outdoor surveillance. Our method paves the way for more accessible image dehazing systems, enhancing visibility in various real-world hazy conditions while optimizing computational resources for fixed camera setups. Full article

(This article belongs to the Special Issue Advances in Neural Networks and Deep Learning)

► Show Figures

Figure 1

17 pages, 2483 KiB

Open AccessArticle

Fire and Smoke Detection in Complex Environments

by Furkat Safarov, Shakhnoza Muksimova, Misirov Kamoliddin and Young Im Cho

Fire 2024, 7(11), 389; https://doi.org/10.3390/fire7110389 - 29 Oct 2024

Cited by 14 | Viewed by 2456

Abstract

Fire detection is a critical task in environmental monitoring and disaster prevention, with traditional methods often limited in their ability to detect fire and smoke in real time over large areas. The rapid identification of fire and smoke in both indoor and outdoor [...] Read more.

Fire detection is a critical task in environmental monitoring and disaster prevention, with traditional methods often limited in their ability to detect fire and smoke in real time over large areas. The rapid identification of fire and smoke in both indoor and outdoor environments is essential for minimizing damage and ensuring timely intervention. In this paper, we propose a novel approach to fire and smoke detection by integrating a vision transformer (ViT) with the YOLOv5s object detection model. Our modified model leverages the attention-based feature extraction capabilities of ViTs to improve detection accuracy, particularly in complex environments where fires may be occluded or distributed across large regions. By replacing the CSPDarknet53 backbone of YOLOv5s with ViT, the model is able to capture both local and global dependencies in images, resulting in more accurate detection of fire and smoke under challenging conditions. We evaluate the performance of the proposed model using a comprehensive Fire and Smoke Detection Dataset, which includes diverse real-world scenarios. The results demonstrate that our model outperforms baseline YOLOv5 variants in terms of precision, recall, and mean average precision (mAP), achieving a mAP@0.5 of 0.664 and a recall of 0.657. The modified YOLOv5s with ViT shows significant improvements in detecting fire and smoke, particularly in scenes with complex backgrounds and varying object scales. Our findings suggest that the integration of ViT as the backbone of YOLOv5s offers a promising approach for real-time fire detection in both urban and natural environments. Full article

► Show Figures

Figure 1

22 pages, 20719 KiB

Open AccessArticle

A Computationally Efficient Neuronal Model for Collision Detection with Contrast Polarity-Specific Feed-Forward Inhibition

by Guangxuan Gao, Renyuan Liu, Mengying Wang and Qinbing Fu

Biomimetics 2024, 9(11), 650; https://doi.org/10.3390/biomimetics9110650 - 22 Oct 2024

Cited by 1 | Viewed by 1653

Abstract

Animals utilize their well-evolved dynamic vision systems to perceive and evade collision threats. Driven by biological research, bio-inspired models based on lobula giant movement detectors (LGMDs) address certain gaps in constructing artificial collision-detecting vision systems with robust selectivity, offering reliable, low-cost, and miniaturized [...] Read more.

Animals utilize their well-evolved dynamic vision systems to perceive and evade collision threats. Driven by biological research, bio-inspired models based on lobula giant movement detectors (LGMDs) address certain gaps in constructing artificial collision-detecting vision systems with robust selectivity, offering reliable, low-cost, and miniaturized collision sensors across various scenes. Recent progress in neuroscience has revealed the energetic advantages of dendritic arrangements presynaptic to the LGMDs, which receive contrast polarity-specific signals on separate dendritic fields. Specifically, feed-forward inhibitory inputs arise from parallel ON/OFF pathways interacting with excitation. However, none of the previous research has investigated the evolution of a computational LGMD model with feed-forward inhibition (FFI) separated by opposite polarity. This study fills this vacancy by presenting an optimized neuronal model where FFI is divided into ON/OFF channels, each with distinct synaptic connections. To align with the energy efficiency of biological systems, we introduce an activation function associated with neural computation of FFI and interactions between local excitation and lateral inhibition within ON/OFF channels, ignoring non-active signal processing. This approach significantly improves the time efficiency of the LGMD model, focusing only on substantial luminance changes in image streams. The proposed neuronal model not only accelerates visual processing in relatively stationary scenes but also maintains robust selectivity to ON/OFF-contrast looming stimuli. Additionally, it can suppress translational motion to a moderate extent. Comparative testing with state-of-the-art based on ON/OFF channels was conducted systematically using a range of visual stimuli, including indoor structured and complex outdoor scenes. The results demonstrated significant time savings in silico while retaining original collision selectivity. Furthermore, the optimized model was implemented in the embedded vision system of a micro-mobile robot, achieving the highest success ratio of collision avoidance at 97.51% while nearly halving the processing time compared with previous models. This highlights a robust and parsimonious collision-sensing mode that effectively addresses real-world challenges. Full article

(This article belongs to the Special Issue Bio-Inspired and Biomimetic Intelligence in Robotics: 2nd Edition)

► Show Figures

Figure 1

23 pages, 7974 KiB

Open AccessArticle

Maize Phenotypic Parameters Based on the Constrained Region Point Cloud Phenotyping Algorithm as a Developed Method

by Qinzhe Zhu, Miaoyuan Bai and Ming Yu

Agronomy 2024, 14(10), 2446; https://doi.org/10.3390/agronomy14102446 - 21 Oct 2024

Cited by 2 | Viewed by 1189

Abstract

As one of the world’s most crucial food crops, maize plays a pivotal role in ensuring food security and driving economic growth. The diversification of maize variety breeding is significantly enhancing the cumulative benefits in these areas. Precise measurement of phenotypic data is [...] Read more.

As one of the world’s most crucial food crops, maize plays a pivotal role in ensuring food security and driving economic growth. The diversification of maize variety breeding is significantly enhancing the cumulative benefits in these areas. Precise measurement of phenotypic data is pivotal for the selection and breeding of maize varieties in cultivation and production. However, in outdoor environments, conventional phenotyping methods, including point cloud processing techniques based on region growing algorithms and clustering segmentation, encounter significant challenges due to the low density and frequent loss of point cloud data. These issues substantially compromise measurement accuracy and computational efficiency. Consequently, this paper introduces a Constrained Region Point Cloud Phenotyping (CRPCP) algorithm that proficiently detects the phenotypic traits of multiple maize plants in sparse outdoor point cloud data. The CRPCP algorithm consists primarily of three core components: (1) a constrained region growth algorithm for effective segmentation of maize stem point clouds in complex backgrounds; (2) a radial basis interpolation technique to bridge gaps in point cloud data caused by environmental factors; and (3) a multi-level parallel decomposition strategy leveraging scene blocking and plant instances to enable high-throughput real-time computation. The results demonstrate that the CRPCP algorithm achieves a segmentation accuracy of 96.2%. When assessing maize plant height, the algorithm demonstrated a strong correlation with manual measurements, evidenced by a coefficient of determination R² of 0.9534, a root mean square error (RMSE) of 0.4835 cm, and a mean absolute error (MAE) of 0.383 cm. In evaluating the diameter at breast height (DBH) of the plants, the algorithm yielded an R² of 0.9407, an RMSE of 0.0368 cm, and an MAE of 0.031 cm. Compared to the PointNet point cloud segmentation method, the CRPCP algorithm reduced segmentation time by more than 44.7%. The CRPCP algorithm proposed in this paper enables efficient segmentation and precise phenotypic measurement of low-density maize multi-plant point cloud data in outdoor environments. This algorithm offers an automated, high-precision, and highly efficient solution for large-scale field phenotypic analysis, with broad applicability in precision breeding, agronomic management, and yield prediction. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

14 pages, 2331 KiB

Open AccessArticle

Enhancing Weather Scene Identification Using Vision Transformer

by Christine Dewi, Muhammad Asad Arshed, Henoch Juli Christanto, Hafiz Abdul Rehman, Amgad Muneer and Shahzad Mumtaz

World Electr. Veh. J. 2024, 15(8), 373; https://doi.org/10.3390/wevj15080373 - 16 Aug 2024

Viewed by 2477

Abstract

The accuracy of weather scene recognition is critical in a world where weather affects every aspect of our everyday lives, particularly in areas like intelligent transportation networks, autonomous vehicles, and outdoor vision systems. The importance of weather in many aspects of our life [...] Read more.

The accuracy of weather scene recognition is critical in a world where weather affects every aspect of our everyday lives, particularly in areas like intelligent transportation networks, autonomous vehicles, and outdoor vision systems. The importance of weather in many aspects of our life highlights the vital necessity for accurate information. Precise weather detection is especially crucial for industries like intelligent transportation, outside vision systems, and driverless cars. The outdated, unreliable, and time-consuming manual identification techniques are no longer adequate. Unmatched accuracy is required for local weather scene forecasting in real time. This work utilizes the capabilities of computer vision to address these important issues. Specifically, we employ the advanced Vision Transformer model to distinguish between 11 different weather scenarios. The development of this model results in a remarkable performance, achieving an accuracy rate of 93.54%, surpassing industry standards such as MobileNetV2 and VGG19. These findings advance computer vision techniques into new domains and pave the way for reliable weather scene recognition systems, promising extensive real-world applications across various industries. Full article

(This article belongs to the Special Issue Advancements in Autonomous Vehicles: Security, Optimization and Future Challenges)

► Show Figures

Figure 1

29 pages, 7421 KiB

Open AccessArticle

Continuous Online Semantic Implicit Representation for Autonomous Ground Robot Navigation in Unstructured Environments

by Quentin Serdel, Julien Marzat and Julien Moras

Robotics 2024, 13(7), 108; https://doi.org/10.3390/robotics13070108 - 18 Jul 2024

Viewed by 2095

Abstract

While mobile ground robots have now the physical capacity of travelling in unstructured challenging environments such as extraterrestrial surfaces or devastated terrains, their safe and efficient autonomous navigation has yet to be improved before entrusting them with complex unsupervised missions in such conditions. [...] Read more.

While mobile ground robots have now the physical capacity of travelling in unstructured challenging environments such as extraterrestrial surfaces or devastated terrains, their safe and efficient autonomous navigation has yet to be improved before entrusting them with complex unsupervised missions in such conditions. Recent advances in machine learning applied to semantic scene understanding and environment representations, coupled with modern embedded computational means and sensors hold promising potential in this matter. This paper therefore introduces the combination of semantic understanding, continuous implicit environment representation and smooth informed path-planning in a new method named COSMAu-Nav. It is specifically dedicated to autonomous ground robot navigation in unstructured environments and adaptable for embedded, real-time usage without requiring any form of telecommunication. Data clustering and Gaussian processes are employed to perform online regression of the environment topography, occupancy and terrain traversability from 3D semantic point clouds while providing an uncertainty modeling. The continuous and differentiable properties of Gaussian processes allow gradient based optimisation to be used for smooth local path-planning with respect to the terrain properties. The proposed pipeline has been evaluated and compared with two reference 3D semantic mapping methods in terms of quality of representation under localisation and semantic segmentation uncertainty using a Gazebo simulation, derived from the 3DRMS dataset. Its computational requirements have been evaluated using the Rellis-3D real world dataset. It has been implemented on a real ground robot and successfully employed for its autonomous navigation in a previously unknown outdoor environment. Full article

(This article belongs to the Special Issue Decision-Making and Control under Uncertainties for Robotic and Autonomous Systems)

► Show Figures

Figure 1

22 pages, 9800 KiB

Open AccessArticle

Point Cloud Denoising in Outdoor Real-World Scenes Based on Measurable Segmentation

by Lianchao Wang, Yijin Chen and Hanghang Xu

Remote Sens. 2024, 16(13), 2347; https://doi.org/10.3390/rs16132347 - 27 Jun 2024

Cited by 2 | Viewed by 1602

Abstract

With the continuous advancements in three-dimensional scanning technology, point clouds are fundamental data in various fields such as autonomous driving, 3D urban modeling, and the preservation of cultural heritage. However, inherent inaccuracies in instruments and external environmental interference often introduce noise and outliers [...] Read more.

With the continuous advancements in three-dimensional scanning technology, point clouds are fundamental data in various fields such as autonomous driving, 3D urban modeling, and the preservation of cultural heritage. However, inherent inaccuracies in instruments and external environmental interference often introduce noise and outliers into point cloud data, posing numerous challenges for advanced processing tasks such as registration, segmentation, classification, and 3D reconstruction. To effectively address these issues, this study proposes a hierarchical denoising strategy based on finite measurable segmentation in spherical space, taking into account the performance differences in horizontal and vertical resolutions of LiDAR systems. The effectiveness of this method was validated through a denoising experiment conducted on point cloud data collected from real outdoor environments. The experimental result indicates that this denoising strategy not only effectively eliminates noise but also more accurately preserves the original detail features of the point clouds, demonstrating significant advantages over conventional denoising techniques. Overall, this study introduces a novel and effective method for denoising point cloud data in outdoor real-world scenes. Full article

(This article belongs to the Section Engineering Remote Sensing)

► Show Figures

Figure 1

15 pages, 3624 KiB

Open AccessArticle

A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental Interaction

by Yu Hao, Fan Yang, Hao Huang, Shuaihang Yuan, Sundeep Rangan, John-Ross Rizzo, Yao Wang and Yi Fang

J. Imaging 2024, 10(5), 103; https://doi.org/10.3390/jimaging10050103 - 26 Apr 2024

Cited by 6 | Viewed by 4327

Abstract

People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards independently. Previous assistive [...] Read more.

People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards independently. Previous assistive technologies for the visually impaired often struggle in real-world scenarios due to the need for constant training and lack of robustness, which limits their effectiveness, especially in dynamic and unfamiliar environments, where accurate and efficient perception is crucial. Therefore, we frame our research question in this paper as: How can we assist pBLV in recognizing scenes, identifying objects, and detecting potential tripping hazards in unfamiliar environments, where existing assistive technologies often falter due to their lack of robustness? We hypothesize that by leveraging large pretrained foundation models and prompt engineering, we can create a system that effectively addresses the challenges faced by pBLV in unfamiliar environments. Motivated by the prevalence of large pretrained foundation models, particularly in assistive robotics applications, due to their accurate perception and robust contextual understanding in real-world scenarios induced by extensive pretraining, we present a pioneering approach that leverages foundation models to enhance visual perception for pBLV, offering detailed and comprehensive descriptions of the surrounding environment and providing warnings about potential risks. Specifically, our method begins by leveraging a large-image tagging model (i.e., Recognize Anything Model (RAM)) to identify all common objects present in the captured images. The recognition results and user query are then integrated into a prompt, tailored specifically for pBLV, using prompt engineering. By combining the prompt and input image, a vision-language foundation model (i.e., InstructBLIP) generates detailed and comprehensive descriptions of the environment and identifies potential risks in the environment by analyzing environmental objects and scenic landmarks, relevant to the prompt. We evaluate our approach through experiments conducted on both indoor and outdoor datasets. Our results demonstrate that our method can recognize objects accurately and provide insightful descriptions and analysis of the environment for pBLV. Full article

(This article belongs to the Special Issue Image and Video Processing for Blind and Visually Impaired)

► Show Figures

Figure 1

Search Results (33)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (33)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI