Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,952)

Search Parameters:
Keywords = feature map visualization

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 1597 KB  
Article
CD-Mosaic: A Context-Aware and Domain-Consistent Data Augmentation Method for PCB Micro-Defect Detection
by Sifan Lai, Shuangchao Ge, Xiaoting Guo, Jie Li and Kaiqiang Feng
Electronics 2026, 15(4), 767; https://doi.org/10.3390/electronics15040767 (registering DOI) - 11 Feb 2026
Abstract
Detecting minute defects, such as spurs on the surface of a Printed Circuit Board (PCB), is extremely challenging due to their small size (average size < 20 pixels), sparse features, and high dependence on circuit topology context. The original Mosaic data augmentation method [...] Read more.
Detecting minute defects, such as spurs on the surface of a Printed Circuit Board (PCB), is extremely challenging due to their small size (average size < 20 pixels), sparse features, and high dependence on circuit topology context. The original Mosaic data augmentation method faces significant challenges with semantic adaptability when dealing with such tasks. Its unrestricted random cropping mechanism easily disrupts the topological structure of minute defects attached to the circuits, leading to the loss of key features. Moreover, a splicing strategy without domain constraints struggles to simulate real texture interference in industrial settings, making it difficult for the model to adapt to the complex and variable industrial inspection environment. To address these issues, this paper proposes a Context-aware and Domain-consistent Mosaic (CD-Mosaic) augmentation algorithm. This algorithm abandons pure randomness and constructs an adaptive augmentation framework that synergizes feature fidelity, geometric generalization, and texture perturbation. Geometrically, an intelligent sampling and dynamic integrity verification mechanism, driven by “utilization-centrality”, is designed to establish a controlled sample quality distribution. This prioritizes the preservation of the topological semantics of dominant samples to guide feature convergence. Meanwhile, an appropriate number of edge-truncated samples are strategically retained as geometric hard examples to enhance the model’s robustness against local occlusion. For texture, a dual-granularity visual perturbation strategy is proposed. Using a homologous texture library, a hard mask is generated in the background area to simulate foreign object interference, and a local transparency soft mask is applied in the defect area to simulate low signal-to-noise ratio imaging. This strategy synthesizes visual hard examples while maintaining photometric consistency. Experiments on an industrial-grade PCB dataset containing 2331 images demonstrate that the YOLOv11m model equipped with CD-Mosaic achieves a significant performance improvement. Compared with the native Mosaic baseline, the core metrics mAP@0.5 and Recall reach 0.923 and 86.1%, respectively, with a net increase of 8.3% and 8.8%; mAP@0.5:0.95 and APsmall, which characterize high-precision localization and small target detection capabilities, are improved to 0.529 (+3.0%) and 0.534 (+3.3%), respectively; the comprehensive metric F1-score jumps to 0.903 (+6.2%). The experiments prove that this method effectively solves the problem of missed detections of industrial minute defects by balancing sample quality and detection difficulty. Moreover, the inference speed of 84.9 FPS fully meets the requirements of industrial real-time detection. Full article
17 pages, 3941 KB  
Article
Machine Learning-Based Prediction of Heavy Metal Contamination and Ecological Risk in Karst Agricultural Soils
by Zhe Liu, Juan Wu, Jie Li, Guodong Zheng, Jianxun Qin, Wenbo Gu and Jiacai Li
Land 2026, 15(2), 304; https://doi.org/10.3390/land15020304 (registering DOI) - 11 Feb 2026
Abstract
Investigating multiple source apportionment methods and quantitatively characterizing heavy metal contamination in soils are of critical importance for effective pollution control and prevention. This study systematically investigates multiple source apportionment methods for soil heavy metals, with quantitative characterization of contamination features crucial for [...] Read more.
Investigating multiple source apportionment methods and quantitatively characterizing heavy metal contamination in soils are of critical importance for effective pollution control and prevention. This study systematically investigates multiple source apportionment methods for soil heavy metals, with quantitative characterization of contamination features crucial for effective pollution control. Taking Jingxi City in Guangxi, China, as a case study, we conducted a comprehensive analysis of 8816 soil samples using multi-source big data integration. By synergistically applying machine learning algorithms, the potential ecological risk index, and bivariate local Moran’s index, we achieved dual objectives: quantitative inversion of eight heavy metal concentrations and simultaneous ecological risk assessment with pollution source identification. Through comparative model evaluation, the XGBoost algorithm demonstrated optimal predictive performance. Contribution analyses revealed that soil properties (Fe2O3, Al2O3, and phosphorus content), road distribution, and elevation significantly regulate heavy metal accumulation. Spatial risk mapping identified cadmium, mercury, and arsenic contamination hotspots as critical environmental threat zones. The bivariate local Moran’s index model elucidated spatial coupling characteristics between ecological risks and environmental drivers, providing spatially explicit decision-making support for precision environmental management. Our multidimensional analytical framework incorporates spatial visualization of heavy metal distribution, hierarchical ecological risk assessment, and pollution source contribution analysis, ultimately establishing a scientific decision-making system for land safety utilization and pollution risk management. This integrated approach offers methodological references for regional heavy metal pollution control in karst environments. Full article
Show Figures

Figure 1

22 pages, 6854 KB  
Article
Vision-Based Detection of Large Coal Fragments in Fully Mechanized Mining Faces Using Adaptive Weighted Attention and Transfer Learning
by Yuan Wang, Jian Lei, Leping Li, Zhengxiong Lu, Lele Xu and Shuanfeng Zhao
Sensors 2026, 26(4), 1167; https://doi.org/10.3390/s26041167 - 11 Feb 2026
Abstract
The unloading port of a scraper conveyor is a critical component in fully mechanized mining operations and is prone to blockages caused by large coal fragments. These blockages primarily result from the limited accuracy and insufficient real-time performance of existing visual perception methods [...] Read more.
The unloading port of a scraper conveyor is a critical component in fully mechanized mining operations and is prone to blockages caused by large coal fragments. These blockages primarily result from the limited accuracy and insufficient real-time performance of existing visual perception methods used by crushing robots to identify large coal pieces in complex mining environments. To address this issue, this paper proposes a visual inspection method for coal mine crushing robots based on transfer learning and an adaptive weighted attention mechanism, termed LCDet. First, a lightweight backbone network incorporating grouped convolution is designed to enhance feature representation while significantly reducing model complexity, thereby meeting deployment requirements. Second, an adaptive weighted attention mechanism is introduced to suppress background interference and emphasize regions containing large coal fragments, particularly enhancing blurred edge textures. In addition, a transfer learning-based training strategy is adopted to improve generalization performance and reduce dependence on large-scale training data. The experimental results on the public DsLMF+ dataset demonstrate that LCDet achieves accuracy, recall, mAP50, and mAP50–95 values of 79.3%, 75.1%, 84.5%, and 56.2%, respectively, achieving a favorable balance between detection accuracy and model complexity. On a self-constructed large coal dataset, LCDet attains accuracy, recall, mAP50, and mAP50–95 of 90.4%, 91.3%, 96.5%, and 69.3%, respectively, outperforming the baseline YOLOv8n model. Compared with other detection methods, LCDet exhibits superior performance while maintaining a relatively low parameter count. These results indicate that LCDet enables lightweight and accurate detection of large coal fragments, supporting real-time deployment on crushing robots in fully mechanized mining environments. Full article
(This article belongs to the Special Issue New Trends in Robot Vision Sensors and System)
Show Figures

Figure 1

11 pages, 1663 KB  
Article
From Plastics to Micro- and Nano-Plastics: Mapping Agricultural Pollution Risk in a Mediterranean Region of Italy
by Ali Hachem, Evelia Schettini, Fabiana Convertino and Giuliano Vox
AgriEngineering 2026, 8(2), 63; https://doi.org/10.3390/agriengineering8020063 - 11 Feb 2026
Abstract
Agricultural plastic waste (APW) is an emerging source of soil pollution and potential micro- and nano-plastic (MNP) contamination in agroecosystems. This study focuses on the Apulia region in southern Italy, a key horticultural and viticultural area with intensive plastic use. Annual APW was [...] Read more.
Agricultural plastic waste (APW) is an emerging source of soil pollution and potential micro- and nano-plastic (MNP) contamination in agroecosystems. This study focuses on the Apulia region in southern Italy, a key horticultural and viticultural area with intensive plastic use. Annual APW was estimated for each agricultural feature using a detailed 1:5000 land use map, crop distribution data, and validated plastic waste indices for several plastic application types. The analysis was integrated within a Geographic Information System (GIS) and combined with relative risk indices (RRIs) to compute and map the agricultural plastic pollution risk index (APPRI), a semi-quantitative indicator designated to estimate the potential release of MNPs from agricultural plastics. The APPRI is obtained by multiplying the APW estimates by the RRIs. The results show a clear spatial heterogeneity in plastic waste generation, with the highest APPRI values in vineyards, orchards, olive groves, and greenhouse systems, particularly in the provinces of Foggia and Bari. Cereal-based cropping systems exhibited the lowest risk values. The study proposes an innovative approach, combining land use, APW, and related potential risk into a single mapping tool. This allows for effectively identifying regional hotspots where management and recycling strategies should be prioritized. This GIS-based tool for assessing and visualizing agricultural plastic pollution risk can support evidence-based decision-making and sustainable waste management in agricultural landscapes. Full article
Show Figures

Figure 1

18 pages, 3215 KB  
Article
YOLO-Night: Lighting the Path for Autonomous Vehicles with Robust Nighttime Perception
by Jinxin Tian, Muhammad Arslan Ghaffar and Zhaokai Li
Sensors 2026, 26(4), 1138; https://doi.org/10.3390/s26041138 - 10 Feb 2026
Abstract
Despite substantial progress in visual perception, object detection systems for autonomous driving still exhibit pronounced performance degradation in nighttime and low-light conditions, where reduced signal-to-noise ratio, blurred object boundaries, and scale ambiguity challenge reliable recognition. Existing YOLO-based detectors, primarily optimized for daytime imagery, [...] Read more.
Despite substantial progress in visual perception, object detection systems for autonomous driving still exhibit pronounced performance degradation in nighttime and low-light conditions, where reduced signal-to-noise ratio, blurred object boundaries, and scale ambiguity challenge reliable recognition. Existing YOLO-based detectors, primarily optimized for daytime imagery, struggle to maintain robustness under such adverse illumination. To address these issues, we propose YOLO-Night, a nighttime-oriented object detection framework that enhances the YOLO11 architecture through a structured integration of contrast enhancement, adaptive receptive field modeling, and multi-scale feature fusion. The framework incorporates a feature-level enhancement mechanism to improve low-contrast representations, employs depthwise switchable atrous convolution to dynamically adapt receptive fields for blurred and small objects, and introduces a multi-scale convolutional block to strengthen feature extraction under severe illumination degradation. In addition, a staged feature fusion strategy with an auxiliary low-level detection head was adopted to mitigate semantic misalignment across feature scales. Extensive experiments on the NightCity dataset demonstrated that YOLO-Night consistently outperformed the YOLO11n baseline, achieving improvements of +14.3% precision, +12.4% recall, and +10.4% mAP@50 under nighttime conditions while maintaining real-time inference capability. These results indicate that targeted architectural adaptations can substantially improve object detection robustness in low-light driving scenarios. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

27 pages, 6570 KB  
Article
LiDAR–Inertial–Visual Odometry Based on Elastic Registration and Dynamic Feature Removal
by Qiang Ma, Fuhong Qin, Peng Xiao, Meng Wei, Sihong Chen, Wenbo Xu, Xingrui Yue, Ruicheng Xu and Zheng He
Electronics 2026, 15(4), 741; https://doi.org/10.3390/electronics15040741 - 9 Feb 2026
Abstract
Simultaneous Localization and Mapping (SLAM) is a fundamental capability for autonomous robots. However, in highly dynamic scenes, conventional SLAM systems often suffer from degraded accuracy due to LiDAR motion distortion and interference from moving objects. To address these challenges, this paper proposes a [...] Read more.
Simultaneous Localization and Mapping (SLAM) is a fundamental capability for autonomous robots. However, in highly dynamic scenes, conventional SLAM systems often suffer from degraded accuracy due to LiDAR motion distortion and interference from moving objects. To address these challenges, this paper proposes a LiDAR–Inertial–Visual odometry framework based on elastic registration and dynamic feature removal, with the aim of enhancing system robustness through detailed algorithmic supplements. In the LiDAR odometry module, an elastic registration-based de-skewing method is introduced by modeling second-order motion, enabling accurate point cloud correction under non-uniform motion. In the visual odometry module, a multi-strategy dynamic feature suppression mechanism is developed, combining IMU-assisted motion consistency verification with a lightweight YOLOv5-based detection network to effectively filter out dynamic interference with low computational overhead. Furthermore, depth information for visual key points is recovered using LiDAR assistance to enable tightly coupled pose estimation. Extensive experiments on the TUM and M2DGR datasets demonstrate that the proposed method achieves a 96.3% reduction in absolute trajectory error (ATE) compared with ORB-SLAM2 in highly dynamic scenarios. Real-world deployment on an embedded computing device further confirms the framework’s real-time performance and practical applicability in complex environments. Full article
Show Figures

Figure 1

31 pages, 17707 KB  
Article
Explainable Machine Learning for Tower-Radar Monitoring of Wind Turbine Blades: Fine-Grained Blade Recognition Under Changing Operational Conditions
by Sercan Alipek, Christian Kexel and Jochen Moll
Sensors 2026, 26(4), 1083; https://doi.org/10.3390/s26041083 - 7 Feb 2026
Viewed by 71
Abstract
This paper evaluates a data-driven classification approach of operational wind turbine blades based on consecutive tower-radar measurements that are each compressed in a two-dimensional slow-time to range representation (radargram). Like many real-world machine learning systems, installed tower-radar systems face some key challenges: (i) [...] Read more.
This paper evaluates a data-driven classification approach of operational wind turbine blades based on consecutive tower-radar measurements that are each compressed in a two-dimensional slow-time to range representation (radargram). Like many real-world machine learning systems, installed tower-radar systems face some key challenges: (i) transferability to new operational contexts, (ii) impediments due to evolving environmental and operational conditions (EOCs), and (iii) limited explainability of their deep neural decisions. These challenges are addressed here with a set of structured machine learning studies. The unique field data comes from a sensor box equipped with a frequency-modulated continuous wave (FMCW) radar (33.4–36 GHz frequency range). Relevant parts of the radargram that contribute to a decision of the used convolutional neural networks were identified by a class-sensitive visualization technique named GuidedGradCAM (Guided Gradient-weighted Class Activation Mapping). The following main contributions are provided to the field of tower-radar monitoring (TRM) in the context of wind energy applications: (i) every individual rotor blade holds a number of characteristic structural features revealed by the radar sensor, which can be used to discriminate rotor blades from the same turbine via neural networks; (ii) those unique features are not agnostic to changing EOCs; and (iii) pixel-level distortions reveal the necessity of low-level information for a precise rotor blade classification. Full article
(This article belongs to the Section Industrial Sensors)
20 pages, 8790 KB  
Article
Small Object Detection with Efficient Multi-Scale Collaborative Attention and Depth Feature Fusion Based on Detection Transformer
by Boran Song, Xizhen Zhu, Guiyuan Yuan, Haixin Wang and Cong Liu
Appl. Sci. 2026, 16(4), 1673; https://doi.org/10.3390/app16041673 - 7 Feb 2026
Viewed by 79
Abstract
Existing DEtection TRansformer-based (DETR) object detection methods have been widely applied to standard object detection tasks, but still face numerous challenges in detecting small objects. These methods frequently miss the fine details of small objects and fail to preserve global context, particularly under [...] Read more.
Existing DEtection TRansformer-based (DETR) object detection methods have been widely applied to standard object detection tasks, but still face numerous challenges in detecting small objects. These methods frequently miss the fine details of small objects and fail to preserve global context, particularly under scale variation or occlusion. The resulting feature maps lack sufficient spatial and structural information. Moreover, some DETR-based models specifically designed for small object detection often have poor generalization capabilities and are difficult to adapt to datasets with diverse object scales and complex backgrounds. To address these issues, this paper proposes a novel object detection model—small object detection with efficient multi-scale collaborative attention and depth feature fusion based on DETR (ED-DETR)—which consists of three core modules: an efficient multi-scale collaborative attention mechanism (EMCA), DepthPro, a zero-shot metric monocular depth estimation model, and an adaptive feature fusion module for depth maps and feature maps. Specifically, EMCA extends the single-space attention mechanism in efficient multi-scale attention (EMA) to a composite structure of parallel spatial and channel attention, enhancing ED-DETR’s ability to express features collaboratively in both spatial and channel dimensions. DepthPro generates depth maps to extract depth information. The adaptive feature fusion module integrates depth information with RGB visual features, improving ED-DETR’s ability to perceive object position, scale, and occlusion. The experimental results show that ED-DETR achieves the current best 33.6% mAP on the AI-TOD-V2 dataset, which predominantly contains tiny objects, outperforming previous CNN-based and DETR-based methods, and shows excellent generalization performance on the VisDrone and COCO datasets. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
21 pages, 2169 KB  
Article
Enhancing Early Detection of Alzheimer’s Disease via Vision Transformer Machine Learning Architecture Using MRI Images
by Wided Hechkel, Marco Leo, Pierluigi Carcagnì, Marco Del-Coco and Abdelhamid Helali
Information 2026, 17(2), 163; https://doi.org/10.3390/info17020163 - 6 Feb 2026
Viewed by 157
Abstract
Computer-aided diagnosis (CAD) systems based on deep learning have shown significant potential for Alzheimer’s disease (AD) stage classification from Magnetic Resonance Imaging (MRI). Nevertheless, challenges such as class imbalance, small sample sizes, and the presence of multiple slices per subject may lead to [...] Read more.
Computer-aided diagnosis (CAD) systems based on deep learning have shown significant potential for Alzheimer’s disease (AD) stage classification from Magnetic Resonance Imaging (MRI). Nevertheless, challenges such as class imbalance, small sample sizes, and the presence of multiple slices per subject may lead to biased evaluation and statistically unreliable performance, particularly for minority classes. In this study, a Vision Transformer (ViT)-based framework is proposed for multi-class AD classification using a Kaggle dataset containing 6400 MRI slices across four cognitive stages. A subject-wise data-splitting strategy is employed to prevent information leakage between the training and testing sets, and the statistical unreliability of near-perfect scores in underrepresented classes is critically examined. An ablation study is conducted to assess the contribution of key architectural components, demonstrating the effectiveness of self-attention and patch embedding in capturing discriminative features. Furthermore, attention-based visualization maps are incorporated to highlight brain regions influencing the model’s decisions and to illustrate subtle anatomical differences between MildDemented and VeryMildDemented cases. The proposed approach achieves a test accuracy of 97.98%, outperforming existing methods on the same dataset while providing improved interpretability. It supports early and accurate AD stage identification. Full article
Show Figures

Graphical abstract

22 pages, 6723 KB  
Article
An Enhanced SegNeXt with Adaptive ROI for a Robust Navigation Line Extraction in Multi-Growth-Stage Maize Fields
by Yuting Zhai, Zongmei Gao, Jian Li, Yang Zhou and Yanlei Xu
Agriculture 2026, 16(3), 367; https://doi.org/10.3390/agriculture16030367 - 4 Feb 2026
Viewed by 159
Abstract
Navigation line extraction is essential for visual navigation in agricultural machinery, yet existing methods often perform poorly in complex environments due to challenges such as weed interference, broken crop rows, and leaf adhesion. To enhance the accuracy and robustness of crop row centerline [...] Read more.
Navigation line extraction is essential for visual navigation in agricultural machinery, yet existing methods often perform poorly in complex environments due to challenges such as weed interference, broken crop rows, and leaf adhesion. To enhance the accuracy and robustness of crop row centerline identification, this study proposes an improved segmentation model based on SegNeXt with integrated adaptive region of interest (ROI) extraction for multi-growth-stage maize row perception. Improvements include constructing a Local module via pooling layers to refine contour features of seedling rows and enhance complementary information across feature maps. A multi-scale fusion attention (MFA) is also designed for adaptive weighted fusion during decoding, improving detail representation and generalization. Additionally, Focal Loss is introduced to mitigate background dominance and strengthen learning from sparse positive samples. An adaptive ROI extraction method was also developed to dynamically focus on navigable regions, thereby improving efficiency and localization accuracy. The outcomes revealed that the proposed model achieves a segmentation accuracy of 95.13% and an IoU of 93.86%. The experimental results show that the proposed algorithm achieves a processing speed of 27 frames per second (fps) on GPU and 16.8 fps on an embedded Jetson TX2 platform. This performance meets the real-time requirements for agricultural machinery operations. This study offers an efficient and reliable perception solution for vision-based navigation in maize fields. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

27 pages, 11971 KB  
Article
An Application Study on Digital Image Classification and Recognition of Yunnan Jiama Based on a YOLO-GAM Deep Learning Framework
by Nan Ji, Fei Ju and Qiang Wang
Appl. Sci. 2026, 16(3), 1551; https://doi.org/10.3390/app16031551 - 3 Feb 2026
Viewed by 140
Abstract
Yunnan Jiama (paper horse prints), a representative form of intangible cultural heritage in southwest China, is characterized by subtle inter-class differences, complex woodblock textures, and heterogeneous preservation conditions, which collectively pose significant challenges for digital preservation and automatic image classification. To address these [...] Read more.
Yunnan Jiama (paper horse prints), a representative form of intangible cultural heritage in southwest China, is characterized by subtle inter-class differences, complex woodblock textures, and heterogeneous preservation conditions, which collectively pose significant challenges for digital preservation and automatic image classification. To address these challenges and improve the computational analysis of Jiama images, this study proposes an enhanced object detection framework based on YOLOv8 integrated with a Global Attention Mechanism (GAM), referred to as YOLOv8-GAM. In the proposed framework, the GAM module is embedded into the high-level semantic feature extraction and multi-scale feature fusion stages of YOLOv8, thereby strengthening global channel–spatial interactions and improving the representation of discriminative cultural visual features. In addition, image augmentation strategies, including brightness adjustment, salt-and-pepper noise, and Gaussian noise, are employed to simulate real-world image acquisition and degradation conditions, which enhances the robustness of the model. Experiments conducted on a manually annotated Yunnan Jiama image dataset demonstrate that the proposed model achieves a mean average precision (mAP) of 96.5% at an IoU threshold of 0.5 and 82.13% under the mAP@0.5:0.95 metric, with an F1-score of 94.0%, outperforming the baseline YOLOv8 model. These results indicate that incorporating global attention mechanisms into object detection networks can effectively enhance fine-grained classification performance for traditional folk print images, thereby providing a practical and scalable technical solution for the digital preservation and computational analysis of intangible cultural heritage. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

15 pages, 884 KB  
Article
AI-Driven Typography: A Human-Centered Framework for Generative Font Design Using Large Language Models
by Yuexi Dong and Mingyong Gao
Information 2026, 17(2), 150; https://doi.org/10.3390/info17020150 - 3 Feb 2026
Viewed by 151
Abstract
This paper presents a human-centered, AI-driven framework for font design that reimagines typography generation as a collaborative process between humans and large language models (LLMs). Unlike conventional pixel- or vector-based approaches, our method introduces a Continuous Style Projector that maps visual features from [...] Read more.
This paper presents a human-centered, AI-driven framework for font design that reimagines typography generation as a collaborative process between humans and large language models (LLMs). Unlike conventional pixel- or vector-based approaches, our method introduces a Continuous Style Projector that maps visual features from a pre-trained ResNet encoder into the LLM’s latent space, enabling zero-shot style interpolation and fine-grained control of stroke and serif attributes. To model handwriting trajectories more effectively, we employ a Mixture Density Network (MDN) head, allowing the system to capture multi-modal stroke distributions beyond deterministic regression. Experimental results show that users can interactively explore, mix, and generate new typefaces in real time, making the system accessible for both experts and non-experts. The approach reduces reliance on commercial font licenses and supports a wide range of applications in education, design, and digital communication. Overall, this work demonstrates how LLM-based generative models can enhance creativity, personalization, and cultural expression in typography, contributing to the broader field of AI-assisted design. Full article
Show Figures

Figure 1

45 pages, 5418 KB  
Review
Visual and Visual–Inertial SLAM for UGV Navigation in Unstructured Natural Environments: A Survey of Challenges and Deep Learning Advances
by Tiago Pereira, Carlos Viegas, Salviano Soares and Nuno Ferreira
Robotics 2026, 15(2), 35; https://doi.org/10.3390/robotics15020035 - 2 Feb 2026
Viewed by 529
Abstract
Localization and mapping remain critical challenges for Unmanned Ground Vehicles (UGVs) operating in unstructured natural environments, such as forests and agricultural fields. While Visual SLAM (VSLAM) and Visual–Inertial SLAM (VI-SLAM) have matured significantly in structured and urban scenarios, their extension to outdoor natural [...] Read more.
Localization and mapping remain critical challenges for Unmanned Ground Vehicles (UGVs) operating in unstructured natural environments, such as forests and agricultural fields. While Visual SLAM (VSLAM) and Visual–Inertial SLAM (VI-SLAM) have matured significantly in structured and urban scenarios, their extension to outdoor natural domains introduces severe challenges, including dynamic vegetation, illumination variations, a lack of distinctive features, and degraded GNSS availability. Recent advances in Deep Learning have brought promising developments to VSLAM- and VI-SLAM-based pipelines, ranging from learned feature extraction and matching to self-supervised monocular depth prediction and differentiable end-to-end SLAM frameworks. Furthermore, emerging methods for adaptive sensor fusion, leveraging attention mechanisms and reinforcement learning, open new opportunities to improve robustness by dynamically weighting the contributions of camera and IMU measurements. This review provides a comprehensive overview of Visual and Visual–Inertial SLAM for UGVs in unstructured environments, highlighting the challenges posed by natural contexts and the limitations of current pipelines. Classic VI-SLAM frameworks and recent Deep-Learning-based approaches were systematically reviewed. Special attention is given to field robotics applications in agriculture and forestry, where low-cost sensors and robustness against environmental variability are essential. Finally, open research directions are discussed, including self-supervised representation learning, adaptive sensor confidence models, and scalable low-cost alternatives. By identifying key gaps and opportunities, this work aims to guide future research toward resilient, adaptive, and economically viable VSLAM and VI-SLAM pipelines, tailored for UGV navigation in unstructured natural environments. Full article
(This article belongs to the Special Issue Localization and 3D Mapping of Intelligent Robotics)
Show Figures

Figure 1

20 pages, 1202 KB  
Article
Adaptive ORB Accelerator on FPGA: High Throughput, Power Consumption, and More Efficient Vision for UAVs
by Hussam Rostum and József Vásárhelyi
Signals 2026, 7(1), 13; https://doi.org/10.3390/signals7010013 - 2 Feb 2026
Viewed by 200
Abstract
Feature extraction and description are fundamental components of visual perception systems used in applications such as visual odometry, Simultaneous Localization and Mapping (SLAM), and autonomous navigation. In resource-constrained platforms, such as Unmanned Aerial Vehicles (UAVs), achieving real-time hardware acceleration on Field-Programmable Gate Arrays [...] Read more.
Feature extraction and description are fundamental components of visual perception systems used in applications such as visual odometry, Simultaneous Localization and Mapping (SLAM), and autonomous navigation. In resource-constrained platforms, such as Unmanned Aerial Vehicles (UAVs), achieving real-time hardware acceleration on Field-Programmable Gate Arrays (FPGAs) is challenging. This work demonstrates an FPGA-based implementation of an adaptive ORB (Oriented FAST and Rotated BRIEF) feature extraction pipeline designed for high-throughput and energy-efficient embedded vision. The proposed architecture is a completely new design for the main algorithmic blocks of ORB, including the FAST (Features from Accelerated Segment Test) feature detector, Gaussian image filtering, moment computation, and descriptor generation. Adaptive mechanisms are introduced to dynamically adjust thresholds and filtering behavior, improving robustness under varying illumination conditions. The design is developed using a High-Level Synthesis (HLS) approach, where all processing modules are implemented as reusable hardware IP cores and integrated at the system level. The architecture is deployed and evaluated on two FPGA platforms, PYNQ-Z2 and KRIA KR260, and its performance is compared against CPU and GPU implementations using a dedicated C++ testbench based on OpenCV. Experimental results demonstrate significant improvements in throughput and energy efficiency while maintaining stable and scalable performance, making the proposed solution suitable for real-time embedded vision applications on UAVs and similar platforms. Notably, the FPGA implementation increases DSP utilization from 11% to 29% compared to the previous designs implemented by other researchers, effectively offloading computational tasks from general purpose logic (LUTs and FFs), reducing LUT usage by 6% and FF usage by 13%, while maintaining overall design stability, scalability, and acceptable thermal margins at 2.387 W. This work establishes a robust foundation for integrating the optimized ORB pipeline into larger drone systems and opens the door for future system-level enhancements. Full article
Show Figures

Figure 1

27 pages, 9162 KB  
Article
Multi-Domain Incremental Learning for Semantic Segmentation via Visual Domain Prompt in Remote Sensing Data
by Junxi Li, Zhiyuan Yan, Wenhui Diao, Yidan Zhang, Zicong Zhu, Yichen Tian and Xian Sun
Remote Sens. 2026, 18(3), 464; https://doi.org/10.3390/rs18030464 - 1 Feb 2026
Viewed by 263
Abstract
Domain incremental learning for semantic segmentation has gained lots of attention due to its importance for many fields including urban planning and autonomous driving. The catastrophic forgetting problem caused by domain shift has been alleviated by structure expansion of the model or data [...] Read more.
Domain incremental learning for semantic segmentation has gained lots of attention due to its importance for many fields including urban planning and autonomous driving. The catastrophic forgetting problem caused by domain shift has been alleviated by structure expansion of the model or data rehearsal. However, these methods ignore similar contextual knowledge between the new and the old data domain and assume that new knowledge and old knowledge are completely mutually exclusive, which cause the model to be trained in a suboptimal direction. Motivated by the prompt learning, we proposed a new domain incremental learning framework named RS-VDP. The key innovation of RS-VDP is to utilize a visual domain prompt to change the optimization direction from input data space and feature space. First, we designed a domain prompt based on a dynamic location module, which applied a visual domain prompt according to a local entropy map to update the distribution of the input images. Second, in order to filter the feature vectors with high confidence, a representation feature alignment based on an entropy map module is proposed. This module ensures the accuracy and stability of the feature vectors involved in the regularization loss, alleviating the problem of semantic drift. Finally, we introduced a new evaluation metric to measure the overall performance of the incremental learning models, solving the problem that the traditional evaluation metric is affected by the single-task accuracy. Comprehensive experiments demonstrated the effectiveness of the proposed method by significantly reducing the degree of catastrophic forgetting. Full article
Show Figures

Figure 1

Back to TopTop