Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,862)

Search Parameters:
Keywords = scene generation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1572 KB  
Article
Efficient Glare Suppression Network for Nighttime Images with Lightweight Parallel Attention and Ghost Convolution
by Ruoyu Yang, Huaixin Chen, Sijie Luo and Zhixi Wang
Sensors 2026, 26(12), 3773; https://doi.org/10.3390/s26123773 (registering DOI) - 12 Jun 2026
Abstract
Aiming at the problems of glare interference, local overexposure and detail loss caused by artificial light sources such as vehicle lamps and street lamps in nighttime road scenes, as well as the challenges of existing glare suppression models with large parameters, high computational [...] Read more.
Aiming at the problems of glare interference, local overexposure and detail loss caused by artificial light sources such as vehicle lamps and street lamps in nighttime road scenes, as well as the challenges of existing glare suppression models with large parameters, high computational complexity and difficulty in deploying on edge devices, this paper proposes a lightweight glare suppression network (LGSNet) based on ghost depthwise separable convolution and Lightweight Parallel Attention. Based on the U-Net architecture, the network introduces ghost depthwise separable convolution blocks (GhostDSC) in the encoder and decoder, which generates ghost features through cheap linear transformations by exploiting feature map redundancy, significantly reducing model parameters and computational costs while maintaining feature representation ability. Meanwhile, a Lightweight Parallel Attention (LPA) module is designed in the decoder stage, which integrates channel attention and pixel attention in parallel, enhancing the network’s attention to glare regions and edge details with extremely low parameter increment to improve detail recovery accuracy. In addition, a joint loss function consisting of background loss, glare loss and reconstruction loss is constructed to collaboratively optimize glare suppression and detail preservation. Experimental results on the public Flare7K++ dataset and the self-built nighttime road glare dataset NRGD show that the proposed method has only 7.45 M parameters, much lower than standard U-Net and Uformer. It achieves competitive results on full-reference metrics such as PSNR, SSIM, LPIPS and no-reference metrics such as NIQE, BRISQUE, PIQE, and can effectively suppress various types of glare interference and restore obscured scene details. It achieves a superior trade-off between model complexity and enhancement performance, significantly reducing the parameter count and computational overhead compared to heavy baselines, thereby offering a highly efficient solution for resource-aware glare suppression tasks. Full article
(This article belongs to the Section Intelligent Sensors)
42 pages, 6382 KB  
Article
Multi-Task Directional Field Learning for Geometry-Aware Building Extraction and Simplified Vector Reconstruction in High-Resolution Remote Sensing
by Junjie Xu, Zhengsheng Chen, Qinghua Zhang and Mulei Zhu
Remote Sens. 2026, 18(12), 1955; https://doi.org/10.3390/rs18121955 (registering DOI) - 12 Jun 2026
Abstract
This paper addresses the problem that high pixel-level segmentation accuracy does not necessarily lead to geometrically compact building boundaries in vectorized outputs. A multi-task directional field learning framework is proposed based on U-Net with a ResNet-50 encoder. The framework introduces directional field supervision [...] Read more.
This paper addresses the problem that high pixel-level segmentation accuracy does not necessarily lead to geometrically compact building boundaries in vectorized outputs. A multi-task directional field learning framework is proposed based on U-Net with a ResNet-50 encoder. The framework introduces directional field supervision and a mask-field alignment loss to jointly optimize building region prediction and local boundary orientation consistency. In addition, a mild topological simplification procedure with a fixed small tolerance is applied to reduce residual staircase-like artifacts during vectorization. Experiments on the WHU building dataset at 0.2 m and 0.3 m spatial resolutions show that the proposed framework produces compact vector representations while maintaining high overlap relative to the raster reference annotations. In the 0.2 m setting, directional field learning improves Boundary IoU compared with the Baseline U-Net, whereas the complete pipeline slightly reduces Mask IoU and F1-score due to the additional simplification step. In the 0.3 m setting, the complete method does not consistently outperform several baselines in conventional pixel-level metrics, but it shows a favorable trade-off between polygon compactness and vector overlap under raster-reference evaluation. These results indicate that the proposed method is more suitable for geometry-aware vector reconstruction and vector simplification than for maximizing general semantic segmentation accuracy. In particular, the average number of polygon vertices is substantially reduced while Vector IoU remains approximately 90–92%. To further address the limitation of evaluating only on the WHU dataset, an additional in-domain validation experiment was conducted on the JAX dataset, which contains more complex building appearances and scene variations. The results show that the proposed Directional Field + Mild DP pipeline consistently reduces polygon complexity on the JAX dataset while maintaining competitive vector overlap. The central objective of the proposed framework is not only to improve mask-level building extraction, but also to enhance boundary-oriented vector reconstruction by learning local boundary-direction consistency and reducing raster-induced polygonal redundancy. Full article
(This article belongs to the Special Issue High-Resolution Remote Sensing Image Processing and Applications)
33 pages, 9216 KB  
Article
From Optical Design to NIIRS and Object Detection: An Integrated Framework for Spatial Image Quality Assessment of Micro-Satellite Constellations
by Jisang Yoon, Junchan Lee, Suwon Lee, Gilsun Jang, Jueon Park, Woojin Jeon, Sang-Hyun Lee, Chol Lee, Cheol-Woo Lim, Chi-Wook Oh, Se-Yon Kim and Seong-Ook Park
Remote Sens. 2026, 18(12), 1943; https://doi.org/10.3390/rs18121943 - 11 Jun 2026
Abstract
For micro-satellite constellations, frequent Earth observation alone does not guarantee archive usability; the archive is operationally useful only when the spatial image quality remains adequate for downstream exploitation. This study presents an integrated framework for assessing spatial image quality using NEONSAT-1 imagery by [...] Read more.
For micro-satellite constellations, frequent Earth observation alone does not guarantee archive usability; the archive is operationally useful only when the spatial image quality remains adequate for downstream exploitation. This study presents an integrated framework for assessing spatial image quality using NEONSAT-1 imagery by linking optical design analysis, image simulation, GIQE-based NIIRS estimation, and YOLOv8-based object detection within a single workflow. NEONSAT-1 panchromatic (PAN), pan-sharpened (PS), and multispectral (MS) products were analyzed together with controlled simulations of system MTF, altitude-dependent GSD variation, and super-resolution processing. Among the native products, PS imagery showed the highest NIIRS and overall detection performance. In the controlled experiments, higher system MTF increased RER and NIIRS, while lower simulated altitude generally produced finer GSD and higher NIIRS for both PS and PAN products. However, detection performance varied by scene, product type, and target class and did not increase in direct proportion to NIIRS. In the super-resolution case study, ×2 SR provided the most consistent NIIRS improvement, whereas detection responses at higher SR scales were target class dependent. These results suggest that spatial image quality should be evaluated not only through interpretability metrics such as NIIRS but also in relation to practical downstream performance. The proposed framework provides a baseline for future constellation-scale image quality assessment. Full article
(This article belongs to the Section Remote Sensing Image Processing)
27 pages, 2501 KB  
Article
Improving the Robustness of Scene-Aware Neuro-Symbolic Solving for Arithmetic Word Problems Under Input Perturbations
by Rao Peng, Litian Huang, Lingzi Zhu and Xinguo Yu
Symmetry 2026, 18(6), 1007; https://doi.org/10.3390/sym18061007 - 11 Jun 2026
Abstract
Robust Arithmetic Word Problem (AWP) solving is important for applying mathematical reasoning systems in educational scenarios, where problem statements may contain changed numerical values, paraphrased descriptions, or irrelevant distracting information. Although Large Language Models (LLMs) have shown strong potential in solving AWPs, their [...] Read more.
Robust Arithmetic Word Problem (AWP) solving is important for applying mathematical reasoning systems in educational scenarios, where problem statements may contain changed numerical values, paraphrased descriptions, or irrelevant distracting information. Although Large Language Models (LLMs) have shown strong potential in solving AWPs, their reasoning processes may still be sensitive to surface-form variations and perturbation-induced noise. To address this issue, this paper proposes a Scene-Aware Neuro-Symbolic solver designed to improve the robustness of AWP solving under perturbations. The proposed method extends the existing scene-aware framework by introducing perturbation-oriented mechanisms at the scene, relation, and symbolic-solving levels. A Chain-of-Scene (CoS) prompting strategy first generates candidate scenes, after which goal-guided filtering retains target-related and bridge scenes while removing distractor-induced scenes. The retained scenes are then processed by the Scene-Aware Syntax-Semantics (S2) method to extract explicit and implicit relations, and relation consistency checking is applied to remove locally plausible but globally irrelevant relations. Finally, the symbolic solver performs iterative equation-based reasoning over the filtered relation sets, with fallback recovery activated when standard solving does not produce a target-compatible answer. Experiments on AGG, MAWPS, and GSM8K show an average accuracy of 92.8% on clean datasets. On GSM-Perturb and AWP-Perturb, the solver achieves perturbed accuracies of 80.8% and 87.5%, with robustness drops of 8.3% and 6.8%, respectively. Ablation results show that scene filtering and relation consistency checking are the main contributors to reducing perturbation-induced errors. These findings suggest that combining LLM-based scene understanding with symbolic relation reasoning is a promising direction for improving the robustness and interpretability of AWP solvers in the evaluated perturbation settings. Full article
(This article belongs to the Special Issue Symmetry and Asymmetry in Human-Computer Interaction)
23 pages, 32934 KB  
Article
AirplaneGen: Skeleton-Guided Generation of Remote Sensing Images with Multi-Instance Airplanes
by Lingxuan Zhu, Yanze Ma, Jiaji Wu, Yanbo Fan, Xiaobing Wang and Mingzhou Tan
Remote Sens. 2026, 18(12), 1940; https://doi.org/10.3390/rs18121940 - 11 Jun 2026
Abstract
Generating realistic and controllable aerial images is important for building and evaluating remote sensing recognition systems, especially when real samples of rare aircraft types or dense airport layouts are limited. However, airplane synthesis remains challenging for generic generative models. Aircraft have rigid and [...] Read more.
Generating realistic and controllable aerial images is important for building and evaluating remote sensing recognition systems, especially when real samples of rare aircraft types or dense airport layouts are limited. However, airplane synthesis remains challenging for generic generative models. Aircraft have rigid and symmetric structures, and airport scenes often contain many closely spaced instances; as a result, existing models tend to produce distorted wings and fuselages or merge adjacent airplanes into ambiguous shapes. To address these issues, we propose AirplaneGen, a skeleton-guided latent diffusion framework for multi-airplane remote sensing image generation. AirplaneGen represents each airplane with an editable eight-keypoint skeleton and uses skeleton-derived soft masks to separate instance-level refinement from background-context modeling during denoising. To support this task, we construct MARS20, a benchmark with 2778 high-resolution aerial scenes and 16,673 airplane instances annotated with skeletons, categories, and contextual descriptions. Experiments on MARS20 show that AirplaneGen improves image fidelity, geometric consistency, and instance separation over representative controllable generation methods. Full article
34 pages, 1396 KB  
Article
From Detection Toward Decision Support: A Hierarchical Visual–Sensor Framework for Zamioculcas Monitoring in Indoor Environments
by Raikhan Amanova, Baurzhan Belgibayev, Yersaiyn Mailybayev, Gulnur Kazbekova, Zhadyra Akanova, Galiya Mamankyzy, Marzhana Amanova, Artem Bykov, Periuza Pirniyazova and Nurzhigit Smailov
Computers 2026, 15(6), 382; https://doi.org/10.3390/computers15060382 - 11 Jun 2026
Abstract
This paper proposes a prototype-level hierarchical visual–sensor framework for monitoring the Zamioculcas houseplant in complex indoor environments and supporting adaptive care-mode selection. The proposed framework combines a two-level visual pipeline, consisting of YOLO-based target plant detection and MobileViT-S-based leaf-condition classification, with a Plant [...] Read more.
This paper proposes a prototype-level hierarchical visual–sensor framework for monitoring the Zamioculcas houseplant in complex indoor environments and supporting adaptive care-mode selection. The proposed framework combines a two-level visual pipeline, consisting of YOLO-based target plant detection and MobileViT-S-based leaf-condition classification, with a Plant Health Index (PHI) and a rule-based decision-support module for integrating visual and IoT-derived indicators. For the detection task, YOLOv8, YOLO12, and YOLO26 were compared, with YOLO26 showing the most balanced performance among the evaluated implementations. To improve robustness in real indoor scenes, negative training samples were added; this reduced the image-level false alarm rate on an independent negative-scene test set from 50.7% to 10.0% and increased specificity from 49.3% to 90.0%. For the second visual level, MobileViT-S achieved an accuracy of 0.9857 and an F1-score of 0.9857 on the independent cropped leaf test subset. To reduce the dependence of this result on a single data split, an additional 5-fold cross-validation experiment was conducted on the full cropped leaf dataset of 847 images, resulting in an accuracy of 0.9858 ± 0.0068 and an F1-score of 0.9853 ± 0.0070. To further address plant-level generalization, an additional unseen-plant validation subset of 60 newly collected cropped leaf images was evaluated, and MobileViT-S achieved an accuracy of 0.9500 and an F1-score of 0.9499. These results support the stability of the leaf-condition classifier within the available data, although larger external validation with strict plant-level and session-level separation remains necessary. In addition, an Arduino-based module-level validation was conducted using a capacitive soil-moisture sensor to verify the proposed sensor-based and Vision–IoT decision rules. The experiment demonstrated that the rule-based layer can distinguish dry, normal, and wet soil states and select conservative care actions depending on both soil moisture and visual-condition input. A brief real-time camera–sensor communication test further confirmed that live camera input, Arduino-based soil-moisture sensing, PHI computation, and care-mode selection can be connected within one decision-support pipeline. The proposed PHI and care-mode selection module are therefore presented as a formalized decision-support layer rather than as a fully validated autonomous irrigation system. Further calibration, actuator integration, and closed-loop validation remain necessary before practical autonomous deployment. Full article
(This article belongs to the Section Internet of Things (IoT) and Industrial IoT)
26 pages, 6700 KB  
Article
YOLO-RCM: An Improved Tomato Maturity Detection Model for Complex Greenhouse Environments
by Dehua Chen, Hao Teng, Yuchen Lu, Yuxuan Zhang and Haorong Wu
Agronomy 2026, 16(12), 1146; https://doi.org/10.3390/agronomy16121146 - 11 Jun 2026
Abstract
To reduce confusion between adjacent maturity categories, as well as false detections and low detection accuracy caused by complex backgrounds in tomato object detection, this study develops an improved YOLOv7-based model, named YOLO-RCM (Reduce classes misjudgment). First, a stability-enhanced ECANet channel attention module [...] Read more.
To reduce confusion between adjacent maturity categories, as well as false detections and low detection accuracy caused by complex backgrounds in tomato object detection, this study develops an improved YOLOv7-based model, named YOLO-RCM (Reduce classes misjudgment). First, a stability-enhanced ECANet channel attention module is embedded into the feature pyramid network (FPN) to strengthen discriminative channel responses. Second, a DCNv2-based deformable convolution enhancement module, namely DCNConv with adaptive magnitude constraints, is incorporated into the backbone network to alleviate feature misalignment caused by shape variation, partial occlusion, and fine-grained appearance differences in tomato maturity detection. Third, the WIoU v3 loss function is adopted to refine bounding box regression stability. The model was evaluated on the public Laboro Tomato dataset and TomatOD dataset. Experimental results indicate that YOLO-RCM obtains 83.7% Precision and 89.6% mAP@0.5, exceeding the baseline by 3.3 and 1.2 percentage points, respectively. Its Recall is 80.5%, with a decrease of 0.8 percentage points, whereas GFLOPs are reduced to 96.9, 6.3 lower than the baseline. These results indicate that the proposed method improves detection accuracy and computational efficiency while maintaining an almost unchanged model scale. The confusion matrix and PR curves further show that YOLO-RCM can effectively mitigate misdetections associated with adjacent maturity stages and complex scenes. In the external-dataset robustness test, Precision and mAP@0.5 are improved by 5.8 and 4.0 percentage points over the baseline, respectively, confirming the generalization ability of the proposed model. The main contribution of this study lies in improving tomato maturity detection from three complementary aspects: channel feature discrimination, local geometric perception, and bounding box regression stability. The study offers a practical technical reference for intelligent tomato harvesting systems in complex agricultural environments. Full article
(This article belongs to the Special Issue Digital Twins in Precision Agriculture)
Show Figures

Figure 1

30 pages, 6616 KB  
Article
One-Shot Box-Centric Teaching for Persistent Robotic Sorting-and-Filling with Relative Pose Constraints
by Wei Du and Jianhua Wu
Sensors 2026, 26(12), 3703; https://doi.org/10.3390/s26123703 - 10 Jun 2026
Viewed by 142
Abstract
Robotic sorting-and-filling tasks in flexible manufacturing require robots to reproduce specified in-box arrangements while adapting to variations in container poses, object availability, sensing conditions, and external interventions. This paper proposes a box-centric one-shot teaching framework for robotic packing tasks with relative pose constraints. [...] Read more.
Robotic sorting-and-filling tasks in flexible manufacturing require robots to reproduce specified in-box arrangements while adapting to variations in container poses, object availability, sensing conditions, and external interventions. This paper proposes a box-centric one-shot teaching framework for robotic packing tasks with relative pose constraints. In the teaching stage, a human operator demonstrates the desired packing layout only once. The system uses reference-prompted SAM-based contour refinement to extract box and in-box object contours, object categories, quantities, and relative position and orientation constraints. These constraints are then converted from pixel-plane measurements into box-local pose constraints, forming a reusable box-centric packing template that preserves both translational and angular layout information. During execution, the recorded template is transferred to detected box instances with different global poses, and executable pick-and-place commands are generated through a task-level perception-to-command pipeline. A mechanism for continuous assignment and state updates is further introduced to maintain residual target slots, update object-to-slot allocation, and report missing or redundant objects across execution rounds. Single-box template transfer experiments achieved mean placement errors of 7.16 mm and 7.57 mm for two recorded templates, while representative post-execution images further showed that the relative object orientations were visually preserved with respect to the taught template footprints. Multi-box experiments demonstrated that unfinished residual slots could be preserved and completed after scene updates without re-teaching. Additional validation with different container types and object shapes showed the feasibility of extending the framework beyond cube-only cases. Ablation tests under nine exposure settings further showed that SAM refinement improved template-acquisition robustness compared with the previous recognition method. These results verify that the proposed framework enables one-shot template acquisition, box-centric layout transfer, relative pose preservation, and persistent task-level execution for constrained robotic packing tasks. Full article
(This article belongs to the Topic Robot Manipulation Learning and Interaction Control)
29 pages, 10114 KB  
Article
A Unified Explainable Autonomous Driving Framework via Cross-Attention Scene Selection and Semantic–Object Fusion
by Habib Dhahri, Fahad Alotaibi, Awais Mahmood and Mousa Jari
Machines 2026, 14(6), 677; https://doi.org/10.3390/machines14060677 - 10 Jun 2026
Viewed by 86
Abstract
Intelligent autonomous driving systems must not only predict the appropriate driving manoeuvre but also provide human-interpretable evidence that justifies the decision. However, existing methods typically address these objectives separately, leading to three practical limitations: multi-stage perception-to-language pipelines can propagate upstream perception errors into [...] Read more.
Intelligent autonomous driving systems must not only predict the appropriate driving manoeuvre but also provide human-interpretable evidence that justifies the decision. However, existing methods typically address these objectives separately, leading to three practical limitations: multi-stage perception-to-language pipelines can propagate upstream perception errors into downstream explanations; post hoc saliency methods often produce pixel-level highlights that are difficult to interpret semantically; and decoupled decision and explanation modules cannot guarantee that the explanation reflects the same scene evidence used for behaviour prediction. In this paper, we propose a unified framework that jointly performs vehicle behaviour prediction and human-centric interpretation from a shared visual backbone. Specifically, a hierarchical Swin Transformer encodes the driving scene into a sequence of spatial tokens, which are processed by two complementary branches. The first branch, termed the Object Selection Module (OSM), learns a compact scene-level semantic representation through query-guided cross-attention, while the second branch extracts a small set of class-agnostic object-centric tokens without requiring bounding-box or segmentation supervision. These two representations are subsequently integrated by a Semantic–Object Fusion (SOF) module based on scaled dot-product attention, residual connections, and a feed-forward network. The behaviour prediction head operates on the fused representation, whereas the interpretation head leverages the semantic representation through a skip connection to preserve decision-relevant context. For surround-view perception, learnable per-camera embeddings are introduced to maintain viewpoint identity with negligible additional parameter cost. Furthermore, a compact language model fine-tuned via Low-Rank Adaptation (LoRA) generates fluent, label-conditioned natural-language justifications. Extensive experiments on two public benchmarks, BDD-OIA and nu-AD, demonstrate that the proposed framework consistently delivers superior performance and provides effective, human-readable interpretations of driving decisions. Full article
16 pages, 20727 KB  
Article
Cross-Media Narrative Transformations of the “Hunter Catches Birds” Tradition in Indo-Persian and Malay Worlds
by Siaw Hung Ng
Arts 2026, 15(6), 137; https://doi.org/10.3390/arts15060137 - 9 Jun 2026
Viewed by 121
Abstract
The tale commonly known as “Hunter Catches Birds” circulates widely across South Asia, the Islamicate world, and insular Southeast Asia. Despite linguistic, religious, and cultural differences, the narrative architecture of the Hunter Catches Birds tale displays remarkable continuities across Buddhist, Persian, Malay, Indonesian, [...] Read more.
The tale commonly known as “Hunter Catches Birds” circulates widely across South Asia, the Islamicate world, and insular Southeast Asia. Despite linguistic, religious, and cultural differences, the narrative architecture of the Hunter Catches Birds tale displays remarkable continuities across Buddhist, Persian, Malay, Indonesian, and Javanese traditions. Its persistence across radically different religious and cultural settings raises a broader question of how narrative meaning remains recognizable through continual reinterpretation. In early Malay renderings, particularly within the Hikayat Bayan Budiman tradition, oral materials are reorganized into framed and nested literary structures. These forms enable both textual and visual interplay while supporting ethical instruction alongside aesthetic elaboration. Frequently positioned as an introductory episode in parrot-cycle literature, the story integrates motifs such as collective escape, feigned death, interspecies conflict, and the tension between loyalty and betrayal. These narrative elements remain open to reinterpretation in different moral and cultural settings. Drawing upon Sanskrit, Persian, Uyghur, Malay, Indonesian, and Javanese materials, this study examines how the tale moved across oral, manuscript, and visual traditions. Rather than treating the narrative as a fixed folktale type, the article approaches it as a flexible modular structure whose ethical meanings were continually reshaped across changing religious and social environments. These interactions generate layered systems of meaning in which image and text jointly shape narrative tension, vulnerability, and strategic judgment. In Persian miniature traditions, scenes of entrapment, sacrifice, and escape are organized through sequential composition and spatial tension, allowing conflict, vulnerability, and narrative causality to be experienced visually as well as textually. By tracing these transformations, this study argues that the enduring vitality of the Hunter Catches Birds tradition may lie less in narrative stability than in the sustained reinterpretation of repeated narrative structures across textual and visual cultures. Full article
Show Figures

Figure 1

29 pages, 4305 KB  
Article
BAFNet: A Few-Shot Segmentation Algorithm Based on Two-Stage Backbone Network Adaptation Fine-Tuning via Meta-Learning
by Yujie Zhang, Yuan Sui, Yubo Wang, Ying Wei and Gang Yang
Mathematics 2026, 14(12), 2059; https://doi.org/10.3390/math14122059 - 9 Jun 2026
Viewed by 74
Abstract
The objective of few-shot segmentation is to segment novel categories given only a few annotated support images. Current FSS methods typically rely on pretrained backbone networks while often overlooking the inherent discrepancy between pretraining tasks and downstream segmentation tasks. This oversight renders the [...] Read more.
The objective of few-shot segmentation is to segment novel categories given only a few annotated support images. Current FSS methods typically rely on pretrained backbone networks while often overlooking the inherent discrepancy between pretraining tasks and downstream segmentation tasks. This oversight renders the models susceptible to noise interference and hinders rapid generalization to novel categories. To address these limitations, we propose BAFNet, a novel few-shot segmentation algorithm based on two-stage backbone adaptive fine-tuning. Our approach incorporates a Feature Activation Adapter module into the backbone network, which operates through similarity feature enhancement and low-dimensional adaptive learning. Building upon this foundation, we develop an adapter-based fine-tuning strategy for the training phase that enhances the backbone network’s capacity for extracting category-relevant features while optimizing similarity representation of the extracted features. Additionally, we introduce a support set-driven, in-episode, online fine-tuning strategy for the testing phase, which leverages data augmentation to generate pseudo-query sets for supervised fine-tuning optimization. Through comprehensive quantitative and qualitative experiments conducted on PASCAL-5i, COCO-20i, and the industrial MT Defect Dataset, our results demonstrate that the proposed BAFNet model achieves state-of-the-art few-shot segmentation performance while utilizing the minimal number of trainable parameters. Our method obtains superior performance for both the mean intersection over union and foreground-background intersection over union evaluation metrics, exhibiting remarkable applicability for both general images in complex scenes and industrial defect segmentation under few-shot conditions. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

22 pages, 4690 KB  
Article
A Human-Centered Multimodal Framework for Characterizing Safety-Relevant Driver Functional Domains: An Exploratory Study of Professional Bus Drivers
by Ting-An Kuo, Chiuhsiang Joe Lin and Po-Hsiang Liu
Sensors 2026, 26(12), 3664; https://doi.org/10.3390/s26123664 - 8 Jun 2026
Viewed by 209
Abstract
This study proposes a human-centered multimodal framework for characterizing safety-relevant driver functional domains in professional bus drivers. Unlike conventional approaches that rely on isolated psychological or physical assessments, the proposed framework integrates self-perception, psychomotor performance, and cognitive–perceptual assessment to provide an exploratory, structured [...] Read more.
This study proposes a human-centered multimodal framework for characterizing safety-relevant driver functional domains in professional bus drivers. Unlike conventional approaches that rely on isolated psychological or physical assessments, the proposed framework integrates self-perception, psychomotor performance, and cognitive–perceptual assessment to provide an exploratory, structured characterization of driver-related functional capacities. Eighteen professional bus drivers participated in this study. Self-perception data were obtained from all 18 participants, whereas psychomotor and cognitive–perceptual assessments were completed by 16 participants. These measurements were used to examine multiple domains relevant to driving safety, including behavioral awareness, motor coordination, attention, visual tracking, and hazard-perception-related processing. Given the modest sample size, the study should be regarded as an exploratory pilot investigation. Data were analyzed using a laboratory-based cross-sectional between-subjects design to examine age- and gender-related differences across the assessed domains. The findings suggested that selected age- and gender-related differences and descriptive tendencies were observable across multiple domains. Male drivers descriptively showed higher self-rating scores, female drivers showed different performance tendencies in selected psychomotor tasks, and male drivers demonstrated substantially greater grip strength. Older drivers showed slower and less efficient performance in several cognitive–perceptual measures, with the clearest age-related effect observed in the tachistoscopic traffic test, where older participants showed a higher error tendency under time-constrained traffic-scene processing conditions. The constructs and measures proposed in this study are intended as general laboratory-based assessments of driver-related capabilities rather than direct measures of actual driving performance, real-time driver-state indicators, or validated sensor-based monitoring indicators. As candidate human-factor constructs, they may inform future driver monitoring research by helping clarify how driver-related signals or behaviors could eventually be linked to underlying functional and safety-related meaning in intelligent transportation environments. Full article
Show Figures

Figure 1

36 pages, 5240 KB  
Article
Single-View Scene Completion via Candidate Model Retrieval and Scale-Aware Registration
by Di Zhao, Yuxing Wang, Ziheng Shi and Junhan Shao
Appl. Sci. 2026, 16(12), 5778; https://doi.org/10.3390/app16125778 - 8 Jun 2026
Viewed by 84
Abstract
Single-view RGB-D observations are often affected by occlusion and restricted viewpoints, leading to incomplete object geometry and underestimated obstacle extents in indoor robot perception. This paper proposes a single-view scene completion framework that integrates candidate model retrieval and scale-aware registration. The framework first [...] Read more.
Single-view RGB-D observations are often affected by occlusion and restricted viewpoints, leading to incomplete object geometry and underestimated obstacle extents in indoor robot perception. This paper proposes a single-view scene completion framework that integrates candidate model retrieval and scale-aware registration. The framework first generates local RGB crops and partial point clouds through automatic instance segmentation; then retrieves complete candidate models by matching the local crops with multi-view rendered CAD images; and finally estimates candidate-to-observation rotation, translation, and scale to insert the selected aligned model into the original scene coordinate system. Experiments show that the retrieval module achieves Recall@1/Recall@5 of 80%/89%. The registration module reaches a success rate of 56.61%, outperforming the second-best method by 12.28 percentage points. More importantly, scene-level evaluation shows that the proposed method improves occupancy F1 from 0.445 to 0.523 and reduces boundary error from 0.202 m to 0.146 m compared with DiffCAD. These results indicate that the proposed framework improves navigation-oriented occupancy and obstacle-boundary recovery under CAD-library-based and segmentation-dependent single-view scene completion settings. Full article
(This article belongs to the Section Robotics and Automation)
Show Figures

Figure 1

32 pages, 2810 KB  
Article
3D Geometry-Aware Efficient Feature Matching for Weakly Textured Scenes
by Libo Sun, Yidong Yan, Wenqi Yang and Wenhu Qin
J. Imaging 2026, 12(6), 253; https://doi.org/10.3390/jimaging12060253 - 7 Jun 2026
Viewed by 112
Abstract
Local feature matching plays a critical role in robotic SLAM and visual localization. However, in weakly textured indoor industrial environments, lightweight appearance-based methods often struggle to learn discriminative and stable local features. To address this challenge, this paper proposes GAEFeat, short for Geometry-Aware [...] Read more.
Local feature matching plays a critical role in robotic SLAM and visual localization. However, in weakly textured indoor industrial environments, lightweight appearance-based methods often struggle to learn discriminative and stable local features. To address this challenge, this paper proposes GAEFeat, short for Geometry-Aware Efficient Feature, a lightweight vision–geometric feature learning network. To address the scarcity of specialized training data, we integrated robotic arm pose priors with depth information to automatically generate cross-view supervision signals and surface-normal labels. Based on this strategy, we constructed two complementary datasets, including a simulated dataset and a real-world dataset, to support feature learning and evaluation in weakly textured indoor industrial environments. For feature extraction, we design a dual enhancement mechanism consisting of a geometric auxiliary branch and a geometry-aware enhancement (GAE) module. The former guides the network to perceive local surface structures through surface normal supervision, while the latter utilizes a gating mechanism to achieve deep fusion between geometric priors and 2D texture descriptors. Experimental results demonstrate that GAEFeat achieves strong robustness and high inference efficiency in relative pose estimation, homography estimation, and visual localization tasks, with particularly notable advantages in near-field, weakly textured industrial scenes. The framework achieves an inference latency of only 3.9 ms on the NVIDIA Jetson AGX Orin edge platform, demonstrating its real-time capability and practical potential for deployment in edge computing environments. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

38 pages, 38359 KB  
Article
A Decoupled Separation, Enhancement, and Purification Framework for Infrared Moving Target Detection in Low-Altitude Remote Sensing
by Dongming Lu, Zuchao Bao, Zechen Tian, Yifan Zhai, Tingting Chen and Jianpo Gao
Remote Sens. 2026, 18(12), 1881; https://doi.org/10.3390/rs18121881 - 7 Jun 2026
Viewed by 129
Abstract
Detecting infrared moving targets in low-altitude remote sensing scenes remains challenging due to strong clutter, scale inconsistency, and residual interference. Because these factors are often coupled in complex scenes and cannot be handled effectively by a single operation, a three-stage progressive Decoupled Separation, [...] Read more.
Detecting infrared moving targets in low-altitude remote sensing scenes remains challenging due to strong clutter, scale inconsistency, and residual interference. Because these factors are often coupled in complex scenes and cannot be handled effectively by a single operation, a three-stage progressive Decoupled Separation, Enhancement, and Purification (DSEP) framework is proposed. The method integrates edge-preserving background decoupling, scale-consistent spatial screening, and residual-response purification into a non-iterative feedforward pipeline. Experiments on six representative self-collected infrared sequences and six selected scenes from the public SIRST dataset suggest that DSEP produces relatively compact and spatially continuous target responses while suppressing background interference. On the self-collected dataset, the method can achieve SCRG and BSF values up to 10.61 and 7.38, respectively, with a processing time of 0.009–0.016 s per frame. Compared with representative spatial filtering, local contrast, and low-rank decomposition methods, DSEP shows a relatively favorable balance between detection performance and low-latency processing efficiency. Although the performance gain becomes smaller in some SIRST scenes, the proposed method still shows generally stable detection performance across the evaluated scenes. Full article
Show Figures

Figure 1

Back to TopTop