Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (216)

Search Parameters:
Keywords = visual information elimination

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
10 pages, 408 KiB  
Article
Comparative Analysis of Descemet Membrane Endothelial Keratoplasty (DMEK) Versus Descemetorhexis Without Keratoplasty (DSO) in Patients with Fuchs Endothelial Corneal Dystrophy
by Vanesa Díaz-Mesa, Álvaro Sánchez-Ventosa, Timoteo González-Cruces, Alberto Membrillo, Marta Villalba-González, Alberto Villarrubia and Antonio Cano-Ortiz
J. Clin. Med. 2025, 14(14), 4857; https://doi.org/10.3390/jcm14144857 - 9 Jul 2025
Viewed by 262
Abstract
Background/Objectives: This retrospective observational study evaluates the efficacy of Descemetorhexis without Keratoplasty (DSO) compared to Descemet Membrane Endothelial Keratoplasty (DMEK) in the management of Fuchs Endothelial Corneal Dystrophy (FECD). The outcomes were compared in terms of the corneal anatomical changes, visual results, [...] Read more.
Background/Objectives: This retrospective observational study evaluates the efficacy of Descemetorhexis without Keratoplasty (DSO) compared to Descemet Membrane Endothelial Keratoplasty (DMEK) in the management of Fuchs Endothelial Corneal Dystrophy (FECD). The outcomes were compared in terms of the corneal anatomical changes, visual results, and complication rates between the two surgical techniques for FECD. Methods: We conducted a retrospective, descriptive, observational study including 31 eyes from 26 patients who underwent either DSO (n = 16) or DMEK (n = 15) at the Department of Ophthalmology, Hospital Arruzafa. Patients were included if they had complete follow-up data at baseline, 6 months, and 1 year post-intervention. Their clinical information was collected from medical records and complementary tests, including the Snellen visual acuity test, Pentacam corneal tomography, and specular microscopy. Results: The average time to achieve best corrected distance visual acuity (CDVA) was significantly longer for DSO (7.44 ± 2.3 months) than for DMEK (5.73 ± 1.9 months, p = 0.004). Complication rates were higher in the DMEK group (26.7%), and in comparison, there was an absence of complications in the DSO group (p = 0.043). Corneal endothelial cell migration was confirmed in patients who underwent DSO, with a mean cell density of 817.17 ± 91.7 cells/mm2 after one year. Conclusions: DSO effectively treated the selected patients with FECD who presented central guttata and corneal edema, achieving visual outcomes equivalent to those of DMEK while reducing complication rates. This technique eliminates the need for donor tissue and immunosuppressive medications, making it a viable alternative for specific cases. Full article
(This article belongs to the Section Ophthalmology)
Show Figures

Figure 1

20 pages, 1993 KiB  
Article
AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features
by Ruochen Zhang, Hyeung-Sik Choi, Dongwook Jung, Phan Huy Nam Anh, Sang-Ki Jeong and Zihao Zhu
Appl. Sci. 2025, 15(13), 7538; https://doi.org/10.3390/app15137538 - 4 Jul 2025
Viewed by 262
Abstract
Monocular 3D object detection is a challenging task in autonomous systems due to the lack of explicit depth information in single-view images. Existing methods often depend on external depth estimators or expensive sensors, which increase computational complexity and complicate integration into existing systems. [...] Read more.
Monocular 3D object detection is a challenging task in autonomous systems due to the lack of explicit depth information in single-view images. Existing methods often depend on external depth estimators or expensive sensors, which increase computational complexity and complicate integration into existing systems. To overcome these limitations, we propose AuxDepthNet, an efficient framework for real-time monocular 3D object detection that eliminates the reliance on external depth maps or pre-trained depth models. AuxDepthNet introduces two key components: the Auxiliary Depth Feature (ADF) module, which implicitly learns depth-sensitive features to improve spatial reasoning and computational efficiency, and the Depth Position Mapping (DPM) module, which embeds depth positional information directly into the detection process to enable accurate object localization and 3D bounding box regression. Leveraging the DepthFusion Transformer (DFT) architecture, AuxDepthNet globally integrates visual and depth-sensitive features through depth-guided interactions, ensuring robust and efficient detection. Extensive experiments on the KITTI dataset show that AuxDepthNet achieves state-of-the-art performance, with AP3D scores of 24.72% (Easy), 18.63% (Moderate), and 15.31% (Hard), and APBEV scores of 34.11% (Easy), 25.18% (Moderate), and 21.90% (Hard) at an IoU threshold of 0.7. Full article
Show Figures

Figure 1

18 pages, 10338 KiB  
Article
Visual Geolocalization for Aerial Vehicles via Fusion of Satellite Remote Sensing Imagery and Its Relative Depth Information
by Maoan Zhou, Dongfang Yang, Jieyu Liu, Weibo Xu, Xiong Qiu and Yongfei Li
Remote Sens. 2025, 17(13), 2291; https://doi.org/10.3390/rs17132291 - 4 Jul 2025
Viewed by 319
Abstract
Visual geolocalization for aerial vehicles based on an analysis of Earth observation imagery is an effective method in GNSS-denied environments. However, existing methods for geographic location estimation have limitations: one relies on high-precision geodetic elevation data, which is costly, and the other assumes [...] Read more.
Visual geolocalization for aerial vehicles based on an analysis of Earth observation imagery is an effective method in GNSS-denied environments. However, existing methods for geographic location estimation have limitations: one relies on high-precision geodetic elevation data, which is costly, and the other assumes a flat ground surface, ignoring elevation differences. This paper presents a novel aerial vehicle geolocalization method. It integrates 2D information and relative depth information, which are both from Earth observation images. Firstly, the aerial and reference remote sensing satellite images are fed into a feature-matching network to extract pixel-level feature-matching pairs. Then, a depth estimation network is used to estimate the relative depth of the satellite remote sensing image, thereby obtaining the relative depth information of the ground area within the field of view of the aerial image. Finally, high-confidence matching pairs with similar depth and uniform distribution are selected to estimate the geographic location of the aerial vehicle. Experimental results demonstrate that the proposed method outperforms existing ones in terms of geolocalization accuracy and stability. It eliminates reliance on elevation data or planar assumptions, thus providing a more adaptable and robust solution for aerial vehicle geolocalization in GNSS-denied environments. Full article
Show Figures

Figure 1

19 pages, 1821 KiB  
Article
Underwater Image Enhancement Using a Diffusion Model with Adversarial Learning
by Xueyan Ding, Xiyu Chen, Yixin Sui, Yafei Wang and Jianxin Zhang
J. Imaging 2025, 11(7), 212; https://doi.org/10.3390/jimaging11070212 - 27 Jun 2025
Viewed by 364
Abstract
Due to the distinctive attributes of underwater environments, underwater images frequently encounter challenges such as low contrast, color distortion, and noise. Current underwater image enhancement techniques often suffer from limited generalization, preventing them from effectively adapting to a variety of underwater images taken [...] Read more.
Due to the distinctive attributes of underwater environments, underwater images frequently encounter challenges such as low contrast, color distortion, and noise. Current underwater image enhancement techniques often suffer from limited generalization, preventing them from effectively adapting to a variety of underwater images taken in different underwater environments. To address these issues, we introduce a diffusion model-based underwater image enhancement method using an adversarial learning strategy, referred to as adversarial learning diffusion underwater image enhancement (ALDiff-UIE). The generator systematically eliminates noise through a diffusion model, progressively aligning the distribution of the degraded underwater image with that of a clear underwater image, while the discriminator helps the generator produce clear, high-quality underwater images by identifying discrepancies and pushing the generator to refine its outputs. Moreover, we propose a multi-scale dynamic-windowed attention mechanism to effectively fuse global and local features, optimizing the process of capturing and integrating information. Qualitative and quantitative experiments on four benchmark datasets—UIEB, U45, SUIM, and LSUI—demonstrate that ALDiff-UIE increases the average PCQI by approximately 12.8% and UIQM by about 15.6%. The results indicate that our method outperforms several mainstream approaches in terms of both visual quality and quantitative metrics, showcasing its effectiveness in enhancing underwater images. Full article
(This article belongs to the Special Issue Underwater Imaging (2nd Edition))
Show Figures

Figure 1

36 pages, 4653 KiB  
Article
A Novel Method for Traffic Parameter Extraction and Analysis Based on Vehicle Trajectory Data for Signal Control Optimization
by Yizhe Wang, Yangdong Liu and Xiaoguang Yang
Appl. Sci. 2025, 15(13), 7155; https://doi.org/10.3390/app15137155 - 25 Jun 2025
Viewed by 310
Abstract
As urban traffic systems become increasingly complex, traditional traffic data collection methods based on fixed detectors face challenges such as poor data quality and acquisition difficulties. Traditional methods also lack the ability to capture complete vehicle path information essential for signal optimization. While [...] Read more.
As urban traffic systems become increasingly complex, traditional traffic data collection methods based on fixed detectors face challenges such as poor data quality and acquisition difficulties. Traditional methods also lack the ability to capture complete vehicle path information essential for signal optimization. While vehicle trajectory data can provide rich spatiotemporal information, its sampling characteristics present new technical challenges for traffic parameter extraction. This study addresses the key issue of extracting traffic parameters suitable for signal timing optimization from sampled trajectory data by proposing a comprehensive method for traffic parameter extraction and analysis based on vehicle trajectory data. The method comprises five modules: data preprocessing, basic feature processing, exploratory data analysis, key feature extraction, and data visualization. An innovative algorithm is proposed to identify which intersections vehicles pass through, effectively solving the challenge of mapping GPS points to road network nodes. A dual calculation method based on instantaneous speed and time difference is adopted, improving parameter estimation accuracy through multi-source data fusion. A highly automated processing toolchain based on Python and MATLAB is developed. The method advances the state of the art through a novel polygon-based trajectory mapping algorithm and a systematic multi-source parameter extraction framework specifically designed for signal control optimization. Validation using actual trajectory data containing 2.48 million records successfully eliminated 30.80% redundant data and accurately identified complete paths for 7252 vehicles. The extracted multi-dimensional parameters, including link flow, average speed, travel time, and OD matrices, accurately reflect network operational status, identifying congestion hotspots, tidal traffic characteristics, and unstable road segments. The research outcomes provide a feasible technical solution for areas lacking traditional detection equipment. The extracted parameters can directly support signal optimization applications such as traffic signal coordination, timing optimization, and congestion management, providing crucial support for implementing data-driven intelligent traffic control. This research presents a theoretical framework validated with real-world data, providing a foundation for future implementation in operational signal control systems. Full article
(This article belongs to the Special Issue Research and Estimation of Traffic Flow Characteristics)
Show Figures

Figure 1

24 pages, 27167 KiB  
Article
ICT-Net: A Framework for Multi-Domain Cross-View Geo-Localization with Multi-Source Remote Sensing Fusion
by Min Wu, Sirui Xu, Ziwei Wang, Jin Dong, Gong Cheng, Xinlong Yu and Yang Liu
Remote Sens. 2025, 17(12), 1988; https://doi.org/10.3390/rs17121988 - 9 Jun 2025
Viewed by 437
Abstract
Traditional single neural network-based geo-localization methods for cross-view imagery primarily rely on polar coordinate transformations while suffering from limited global correlation modeling capabilities. To address these fundamental challenges of weak feature correlation and poor scene adaptation, we present a novel framework termed ICT-Net [...] Read more.
Traditional single neural network-based geo-localization methods for cross-view imagery primarily rely on polar coordinate transformations while suffering from limited global correlation modeling capabilities. To address these fundamental challenges of weak feature correlation and poor scene adaptation, we present a novel framework termed ICT-Net (Integrated CNN-Transformer Network) that synergistically combines convolutional neural networks with Transformer architectures. Our approach harnesses the complementary strengths of CNNs in capturing local geometric details and Transformers in establishing long-range dependencies, enabling comprehensive joint perception of both local and global visual patterns. Furthermore, capitalizing on the Transformer’s flexible input processing mechanism, we develop an attention-guided non-uniform cropping strategy that dynamically eliminates redundant image patches with minimal impact on localization accuracy, thereby achieving enhanced computational efficiency. To facilitate practical deployment, we propose a deep embedding clustering algorithm optimized for rapid parsing of geo-localization information. Extensive experiments demonstrate that ICT-Net establishes new state-of-the-art localization accuracy on the CVUSA benchmark, achieving a top-1 recall rate improvement of 8.6% over previous methods. Additional validation on a challenging real-world dataset collected at Beihang University (BUAA) further confirms the framework’s effectiveness and practical applicability in complex urban environments, particularly showing 23% higher robustness to vegetation variations. Full article
Show Figures

Figure 1

29 pages, 2494 KiB  
Article
A Novel Framework for Natural Language Interaction with 4D BIM
by Larin Jaff, Sahej Garg and Gursans Guven
Buildings 2025, 15(11), 1840; https://doi.org/10.3390/buildings15111840 - 27 May 2025
Viewed by 752
Abstract
Natural language interfaces can transform the construction industry by enhancing accessibility and reducing administrative workload in the day-to-day operations of project teams. This paper introduces the Voice-Integrated Scheduling Assistant for 4D BIM (VISA4D) tool that integrates speech recognition and Natural Language Processing (NLP) [...] Read more.
Natural language interfaces can transform the construction industry by enhancing accessibility and reducing administrative workload in the day-to-day operations of project teams. This paper introduces the Voice-Integrated Scheduling Assistant for 4D BIM (VISA4D) tool that integrates speech recognition and Natural Language Processing (NLP) capabilities with Building Information Modeling (BIM) to streamline construction schedule updating and maintenance processes. It accepts voice and text inputs for schedule updates, facilitating real-time integration with Autodesk Navisworks, and eliminates the need for direct access to or advanced knowledge of BIM tools. It also provides visual progress tracking abilities through colour-coded elements within the 4D BIM model for communicating task status updates within the project teams. To demonstrate its capability to enhance schedule updating and maintenance efficiency, the VISA4D tool is implemented in an office building project in Canada and user testing is performed. An overall accuracy of 89% was observed in successfully classifying 71 out of 80 tested construction-specific commands, while the user surveys indicated high usability, with 92% of participants finding VISA4D easy to use and reporting consistent command recognition accuracy. This study advances the existing work on AI-enhanced construction management tools by tackling the challenges associated with their practical implementation in field operations. Full article
(This article belongs to the Special Issue Data Analytics Applications for Architecture and Construction)
Show Figures

Figure 1

20 pages, 3616 KiB  
Article
An RGB-D Camera-Based Wearable Device for Visually Impaired People: Enhanced Navigation with Reduced Social Stigma
by Zhiwen Li, Fred Han and Kangjie Zheng
Electronics 2025, 14(11), 2168; https://doi.org/10.3390/electronics14112168 - 27 May 2025
Viewed by 713
Abstract
This paper presents an intelligent navigation wearable device for visually impaired individuals. The system aims to improve their independent travel capabilities and reduce the negative emotional impacts associated with visible disability indicators in travel tools. It employs an RGB-D camera and an inertial [...] Read more.
This paper presents an intelligent navigation wearable device for visually impaired individuals. The system aims to improve their independent travel capabilities and reduce the negative emotional impacts associated with visible disability indicators in travel tools. It employs an RGB-D camera and an inertial measurement unit (IMU) sensor to facilitate real-time obstacle detection and recognition via advanced point cloud processing and YOLO-based target recognition techniques. An integrated intelligent interaction module identifies the core obstacle from the detected obstacles and translates this information into multidimensional auxiliary guidance. Users receive haptic feedback to navigate obstacles, indicating directional turns and distances, while auditory prompts convey the identity and distance of obstacles, enhancing spatial awareness. The intuitive vibrational guidance significantly enhances safety during obstacle avoidance, and the voice instructions promote a better understanding of the surrounding environment. The device adopts an arm-mounted design, departing from the traditional cane structure that reinforces disability labeling and social stigma. This lightweight mechanical design prioritizes user comfort and mobility, making it more user-friendly than traditional stick-type aids. Experimental results demonstrate that this system outperforms traditional white canes and ultrasonic devices in reducing collision rates, particularly for mid-air obstacles, thereby significantly improving safety in dynamic environments. Furthermore, the system’s ability to vocalize obstacle identities and distances in advance enhances spatial perception and interaction with the environment. By eliminating the cane structure, this innovative wearable design effectively minimizes social stigma, empowering visually impaired individuals to travel independently with increased confidence, ultimately contributing to an improved quality of life. Full article
Show Figures

Figure 1

27 pages, 9977 KiB  
Article
Mergeable Probabilistic Voxel Mapping for LiDAR–Inertial–Visual Odometry
by Balong Wang, Nassim Bessaad, Huiying Xu, Xinzhong Zhu and Hongbo Li
Electronics 2025, 14(11), 2142; https://doi.org/10.3390/electronics14112142 - 24 May 2025
Cited by 1 | Viewed by 742
Abstract
To address the limitations of existing LiDAR–visual fusion methods in adequately accounting for map uncertainties induced by LiDAR measurement noise, this paper introduces a LiDAR–inertial–visual odometry framework leveraging mergeable probabilistic voxel mapping. The method innovatively employs probabilistic voxel models to characterize uncertainties in [...] Read more.
To address the limitations of existing LiDAR–visual fusion methods in adequately accounting for map uncertainties induced by LiDAR measurement noise, this paper introduces a LiDAR–inertial–visual odometry framework leveraging mergeable probabilistic voxel mapping. The method innovatively employs probabilistic voxel models to characterize uncertainties in environmental geometric plane features and optimizes computational efficiency through a voxel merging strategy. Additionally, it integrates color information from cameras to further enhance localization accuracy. Specifically, in the LiDAR–inertial odometry (LIO) subsystem, a probabilistic voxel plane model is constructed for LiDAR point clouds to explicitly represent measurement noise uncertainty, thereby improving the accuracy and robustness of point cloud registration. A voxel merging strategy based on the union-find algorithm is introduced to merge coplanar voxel planes, reducing computational load. In the visual–inertial odometry (VIO) subsystem, image tracking points are generated through a global map projection, and outlier points are eliminated using a random sample consensus algorithm based on a dynamic Bayesian network. Finally, state estimation accuracy is enhanced by jointly optimizing frame-to-frame reprojection errors and frame-to-map RGB color errors. Experimental results demonstrate that the proposed method achieves root mean square errors (RMSEs) of absolute trajectory error at 0.478 m and 0.185 m on the M2DGR and NTU-VIRAL datasets, respectively, while attaining real-time performance with an average processing time of 39.19 ms per-frame on the NTU-VIRAL datasets. Compared to state-of-the-art approaches, our method exhibits significant improvements in both accuracy and computational efficiency. Full article
(This article belongs to the Special Issue Advancements in Robotics: Perception, Manipulation, and Interaction)
Show Figures

Figure 1

12 pages, 1391 KiB  
Article
Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation
by Federico Cioffi, Massimiliano Masullo, Aniello Pascale and Luigi Maffei
Acoustics 2025, 7(2), 30; https://doi.org/10.3390/acoustics7020030 - 23 May 2025
Viewed by 1117
Abstract
Speech intelligibility (SI) is critical in effective communication across various settings, although it is often compromised by adverse acoustic conditions. In noisy environments, visual cues such as lip movements and facial expressions, when congruent with auditory information, can significantly enhance speech perception and [...] Read more.
Speech intelligibility (SI) is critical in effective communication across various settings, although it is often compromised by adverse acoustic conditions. In noisy environments, visual cues such as lip movements and facial expressions, when congruent with auditory information, can significantly enhance speech perception and reduce cognitive effort. In an ever-growing diffusion of virtual environments, communicating through virtual avatars is becoming increasingly prevalent, thus requiring a comprehensive understanding of these dynamics to ensure effective interactions. The present study used Unreal Engine’s MetaHuman technology to compare four methodologies used to create facial animation: MetaHuman Animator (MHA), MetaHuman LiveLink (MHLL), Audio-Driven MetaHuman (ADMH), and Synthetized Audio-Driven MetaHuman (SADMH). Thirty-six word pairs from the Diagnostic Rhyme Test (DRT) were used as input stimuli to create the animations and to compare them in terms of intelligibility. Moreover, to simulate a challenging background noise, the animations were mixed with a babble noise at a signal-to-noise ratio of −13 dB (A). Participants assessed a total of 144 facial animations. Results showed the ADMH condition to be the most intelligible among the methodologies used, probably due to enhanced clarity and consistency in the generated facial animations, while eliminating distractions like micro-expressions and natural variations in human articulation. Full article
Show Figures

Figure 1

21 pages, 7233 KiB  
Article
Advancing Traditional Dunhuang Regional Pattern Design with Diffusion Adapter Networks and Cross-Entropy
by Yihuan Tian, Tao Yu, Zuling Cheng and Sunjung Lee
Entropy 2025, 27(5), 546; https://doi.org/10.3390/e27050546 - 21 May 2025
Viewed by 602
Abstract
To promote the inheritance of traditional culture, a variety of emerging methods rooted in machine learning and deep learning have been introduced. Dunhuang patterns, an important part of traditional Chinese culture, are difficult to collect in large numbers due to their limited availability. [...] Read more.
To promote the inheritance of traditional culture, a variety of emerging methods rooted in machine learning and deep learning have been introduced. Dunhuang patterns, an important part of traditional Chinese culture, are difficult to collect in large numbers due to their limited availability. However, existing text-to-image methods are computationally intensive and struggle to capture fine details and complex semantic relationships in text and images. To address these challenges, this paper proposes the Diffusion Adapter Network (DANet). It employs a lightweight adapter module to extract visual structural information, enabling the diffusion model to generate Dunhuang patterns with high accuracy, while eliminating the need for expensive fine-tuning of the original model. The attention adapter incorporates a multihead attention module (MHAM) to enhance image modality cues, allowing the model to focus more effectively on key information. A multiscale attention module (MSAM) is employed to capture features at different scales, thereby providing more precise generative guidance. In addition, an adaptive control mechanism (ACM) dynamically adjusts the guidance coefficients across feature layers to further enhance generation quality. In addition, incorporating a cross-entropy loss function enhances the model’s capability in semantic understanding and the classification of Dunhuang patterns. The DANet achieves state-of-the-art (SOTA) performance on the proposed Diversified Dunhuang Patterns Dataset (DDHP). Specifically, it attains a perceptual similarity score (LPIPS) of 0.498, a graph matching score (CLIP score) of 0.533, and a feature similarity score (CLIP-I) of 0.772. Full article
(This article belongs to the Special Issue Entropy in Machine Learning Applications, 2nd Edition)
Show Figures

Figure 1

17 pages, 10094 KiB  
Article
EMS-SLAM: Dynamic RGB-D SLAM with Semantic-Geometric Constraints for GNSS-Denied Environments
by Jinlong Fan, Yipeng Ning, Jian Wang, Xiang Jia, Dashuai Chai, Xiqi Wang and Ying Xu
Remote Sens. 2025, 17(10), 1691; https://doi.org/10.3390/rs17101691 - 12 May 2025
Viewed by 605
Abstract
Global navigation satellite systems (GNSSs) exhibit significant performance limitations in signal-deprived environments such as indoor spaces and underground spaces. Although visual SLAM has emerged as a viable solution for ego-motion estimation in GNSS-denied areas, conventional approaches remain constrained by static environment assumptions, resulting [...] Read more.
Global navigation satellite systems (GNSSs) exhibit significant performance limitations in signal-deprived environments such as indoor spaces and underground spaces. Although visual SLAM has emerged as a viable solution for ego-motion estimation in GNSS-denied areas, conventional approaches remain constrained by static environment assumptions, resulting in a substantial degradation in accuracy when handling dynamic scenarios. The EMS-SLAM framework combines the geometric constraints and semantics of SLAM to provide a real-time solution for addressing the challenges of robustness and accuracy in dynamic environments. To improve the accuracy of the initial pose, EMS-SLAM employs a feature-matching algorithm based on a graph-cut RANSAC. In addition, a degeneracy-resistant geometric constraint method is proposed, which effectively addresses the degeneracy issues of purely epipolar approaches. Finally, EMS-SLAM combines semantic information with geometric constraints to maintain high accuracy while quickly eliminating dynamic feature points. Experiments were conducted on the public datasets and our collected datasets. The results demonstrate that our method outperformed the current algorithms of SLAM in highly dynamic environments. Full article
Show Figures

Graphical abstract

16 pages, 3685 KiB  
Article
Enhanced Simultaneous Localization and Mapping Algorithm Based on Deep Learning for Highly Dynamic Environment
by Yin Lu, Haibo Wang, Jin Sun and J. Andrew Zhang
Sensors 2025, 25(8), 2539; https://doi.org/10.3390/s25082539 - 17 Apr 2025
Cited by 1 | Viewed by 482
Abstract
Visual simultaneous localization and mapping (SLAM) is a critical technology for autonomous navigation in dynamic environments. However, traditional SLAM algorithms often struggle to maintain accuracy in highly dynamic environments, where elements undergo significant, rapid, and unpredictable changes, leading to asymmetric information acquisition. Aiming [...] Read more.
Visual simultaneous localization and mapping (SLAM) is a critical technology for autonomous navigation in dynamic environments. However, traditional SLAM algorithms often struggle to maintain accuracy in highly dynamic environments, where elements undergo significant, rapid, and unpredictable changes, leading to asymmetric information acquisition. Aiming to improve the accuracy of the SLAM algorithm in a dynamic environment, a dynamic SLAM algorithm based on deep learning is proposed. Firstly, YOLOv10n is used to improve the front end of the system, and semantic information is added to each frame of the image. Then, ORB-SLAM2 is used to extract feature points in each region of each frame and retrieve semantic information from YOLOv10n. Finally, through the map construction thread, the dynamic object feature points extracted by YOLOv10n are eliminated, and the construction of a static map is realized. The experimental results show that the accuracy of the proposed algorithm is improved by more than 96% compared with the state-of-the-art ORB-SLAM2 in a highly dynamic environment. Compared with other dynamic SLAM algorithms, the proposed algorithm has improved both accuracy and runtime. Full article
Show Figures

Figure 1

28 pages, 26590 KiB  
Article
Geometry-Constrained Learning-Based Visual Servoing with Projective Homography-Derived Error Vector
by Yueyuan Zhang, Arpan Ghosh, Yechan An, Kyeongjin Joo, SangMin Kim and Taeyong Kuc
Sensors 2025, 25(8), 2514; https://doi.org/10.3390/s25082514 - 16 Apr 2025
Viewed by 346
Abstract
We propose a novel geometry-constrained learning-based method for camera-in-hand visual servoing systems that eliminates the need for camera intrinsic parameters, depth information, and the robot’s kinematic model. Our method uses a cerebellar model articulation controller (CMAC) to execute online Jacobian estimation within the [...] Read more.
We propose a novel geometry-constrained learning-based method for camera-in-hand visual servoing systems that eliminates the need for camera intrinsic parameters, depth information, and the robot’s kinematic model. Our method uses a cerebellar model articulation controller (CMAC) to execute online Jacobian estimation within the control framework. Specifically, we introduce a fixed-dimension, uniform-magnitude error function based on the projective homography matrix. The fixed-dimension error function ensures a constant Jacobian size regardless of the number of feature points, thereby reducing computational complexity. By not relying on individual feature points, the approach maintains robustness even when some features are occluded. The uniform magnitude of the error vector elements simplifies neural network input normalization, thereby enhancing online training efficiency. Furthermore, we incorporate geometric constraints between feature points (such as collinearity preservation) into the network update process, ensuring that model predictions conform to the fundamental principles of projective geometry and eliminating physically impossible control outputs. Experimental and simulation results demonstrate that our approach achieves superior robustness and faster learning rates compared to other model-free image-based visual servoing methods. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

17 pages, 8625 KiB  
Article
Research on the Visual SLAM Algorithm for Unmanned Surface Vehicles in Nearshore Dynamic Scenarios
by Yanran Zhang, Lan Zhang, Qiang Yu and Bowen Xing
J. Mar. Sci. Eng. 2025, 13(4), 679; https://doi.org/10.3390/jmse13040679 - 27 Mar 2025
Cited by 1 | Viewed by 1066
Abstract
To address the challenges of visual SLAM algorithms in unmanned surface vehicles (USVs) during nearshore navigation or docking, where dynamic feature points degrade localization accuracy and dynamic objects impede static dense mapping, this study proposes an improved visual SLAM algorithm that removes dynamic [...] Read more.
To address the challenges of visual SLAM algorithms in unmanned surface vehicles (USVs) during nearshore navigation or docking, where dynamic feature points degrade localization accuracy and dynamic objects impede static dense mapping, this study proposes an improved visual SLAM algorithm that removes dynamic feature points. Building upon the ORB-SLAM3 framework, the improved SLAM algorithm integrates a shore segmentation module and a dynamic region elimination module, while enabling static dense point cloud mapping. The system first implements shore segmentation based on Otsu’s method to generate masks covering water and sky regions, ensuring the SLAM system avoids extracting interfering feature points from these areas. Secondly, the deep learning network YOLOv8n-seg is employed to detect priori dynamic objects, with the motion consistency check method to identify non-priori dynamic feature points, collectively removing dynamic feature points. Additionally, the ELAS algorithm computes disparity maps, integrating depth information and dynamic object information to construct a static dense map. Experimental results demonstrate that, compared to the original ORB-SLAM3, the improved SLAM algorithm achieves superior localization accuracy in dynamic nearshore environments, significantly reduces the impact of dynamic objects on pose estimation, and successfully constructs ghosting-free static dense point cloud maps. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

Back to TopTop