Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (378)

Search Parameters:
Keywords = visual sensing module

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1456 KB  
Article
A Camera-Based Multimodal Defect Sensing Framework for Substation Equipment Monitoring via Cross-Modal Feature Mapping
by Ziquan Liu, Hai Xue, Chengbo Hu, Chao Wei and Can Zhang
Sensors 2026, 26(12), 3935; https://doi.org/10.3390/s26123935 (registering DOI) - 21 Jun 2026
Viewed by 126
Abstract
To address the limitations of vision-only defect detection, image–semantic misalignment, and spatial-logic conflicts in complex substation inspection scenarios, this paper proposes a camera-sensor-based multimodal defect sensing framework with cross-modal feature mapping for substation equipment monitoring. The proposed framework integrates field inspection images acquired [...] Read more.
To address the limitations of vision-only defect detection, image–semantic misalignment, and spatial-logic conflicts in complex substation inspection scenarios, this paper proposes a camera-sensor-based multimodal defect sensing framework with cross-modal feature mapping for substation equipment monitoring. The proposed framework integrates field inspection images acquired by camera sensors, defect textual descriptions, and equipment topology knowledge and establishes a unified domain-adaptive pre-training–bidirectional cross-modal mapping–hierarchical reasoning workflow. First, a Contrastive Language–Image Pre-training (CLIP)-based domain-adaptive pre-training strategy is developed to enhance the representation of equipment categories, defect attributes, and inspection-scene semantics. Second, a bidirectional cross-modal feature mapping network is constructed to model fine-grained interactions between candidate visual regions and textual semantics, where uncertainty-aware fusion and prototype constraints are introduced to improve semantic alignment and defect discrimination. Third, a hierarchical neuro-symbolic reasoning module incorporates equipment topology and spatial rules for posterior verification, logical consistency checking, and false-positive suppression. Experiments on a substation inspection image dataset demonstrate that the proposed method achieves 90.8% mAP@0.5, 68.7% mAP@0.5:0.95, and 89.4% F1-score, outperforming mainstream and recent detection models. Full article
26 pages, 3882 KB  
Article
Remote Sensing Small Object Detection Network Based on Wavelet-Convolution and Fine-Grained Preservation
by Hangyu Li and Tiecheng Song
Information 2026, 17(6), 609; https://doi.org/10.3390/info17060609 (registering DOI) - 18 Jun 2026
Viewed by 171
Abstract
Small object detection in remote sensing imagery is a fundamental task for visual information extraction, yet it remains challenging due to extremely small target scales, complex backgrounds, and the loss of discriminative feature information caused by repeated downsampling. To address these issues, this [...] Read more.
Small object detection in remote sensing imagery is a fundamental task for visual information extraction, yet it remains challenging due to extremely small target scales, complex backgrounds, and the loss of discriminative feature information caused by repeated downsampling. To address these issues, this paper proposes a Wavelet-Convolution and Fine-Grained Preservation Network (WCFPNet) based on YOLOv8n. Specifically, a Wavelet-Convolution Module (WCM) is introduced into the backbone to decompose feature maps into low- and high-frequency sub-bands, thereby enhancing structural feature modeling and preserving subtle target details. To compensate for the weakened fine-grained information after repeated downsampling, an Enhanced Spatial Pyramid Pooling-Fast (ESPPF) module is embedded at the end of the backbone to strengthen multi-scale contextual aggregation. In addition, an Enhanced Feature Pyramid Network (EFPN) is designed in the neck to facilitate the propagation of shallow and intermediate fine-grained features to high-level semantic features through cross-level fusion and the Convolutional Block Attention Module (CBAM). Experiments on the NWPU VHR-10 dataset show that WCFPNet achieves 0.879 mAP@0.5 and 0.515 mAP@0.5:0.95, outperforming YOLOv8n by 1.7 and 2.5 percentage points, respectively. Moreover, the proposed WCFPNet achieves a competitive performance compared with several representative detectors while maintaining moderate model complexity. These results demonstrate the effectiveness of WCFPNet in challenging remote sensing scenes characterized by complex backgrounds, dense object distributions, and weak textures. Full article
Show Figures

Figure 1

35 pages, 9814 KB  
Article
EO2SAR-Diff: Structure-Aware Latent Diffusion for Unpaired EO-to-SAR Translation
by Yeon-Wook Kim and Kiyoung Kim
Remote Sens. 2026, 18(12), 2037; https://doi.org/10.3390/rs18122037 - 18 Jun 2026
Viewed by 209
Abstract
Synthetic aperture radar (SAR) imagery provides all-weather, day-and-night observation capabilities that complement electro-optical (EO) imaging; however, the limited number of operational SAR satellites and the difficulty of acquiring expert-annotated SAR datasets constrain deep-learning-based SAR image analysis. In this paper, we propose EO2SAR-Diff, a [...] Read more.
Synthetic aperture radar (SAR) imagery provides all-weather, day-and-night observation capabilities that complement electro-optical (EO) imaging; however, the limited number of operational SAR satellites and the difficulty of acquiring expert-annotated SAR datasets constrain deep-learning-based SAR image analysis. In this paper, we propose EO2SAR-Diff, a conditional latent diffusion framework that translates EO aerial images into realistic synthetic SAR images. The framework comprises three core components: (1) domain-adaptive LoRA pre-training that anchors the Stable Diffusion backbone in the remote sensing domain, (2) a style extraction and injection network that captures SAR-specific visual characteristics via multi-scale feature encoding and parallel cross-attention, and (3) a multi-branch ControlNet with three parallel branches for complementary structural guidance. These components are coordinated by a dual-axis feature injection strategy that modulates conditioning strength along both spatial (per-block) and temporal (per-timestep) dimensions. Experiments on the DOTA 1.0 and SARDet-100K datasets demonstrate that EO2SAR-Diff ranks in the top tier among all compared methods in distributional alignment with real SAR imagery, in terms of FID and KID computed with two SAR-domain-adapted feature extractors. Augmenting the SAR training set with our synthetic images yields consistent improvements in downstream object detection performance, confirming the practical utility of the proposed framework. Full article
(This article belongs to the Special Issue AI-Driven Remote Sensing Image Restoration and Generation)
Show Figures

Figure 1

30 pages, 42422 KB  
Article
Bi-Level Meta-Learning for Reliable Remote Sensing Image Registration
by Lin Shi, Renzhen Wang, Xiaofeng Zhu, Cong An, Kai Zhao, Jun Shu, Dongfang Yang and Deyu Meng
Remote Sens. 2026, 18(12), 2007; https://doi.org/10.3390/rs18122007 - 16 Jun 2026
Viewed by 128
Abstract
Unmanned aerial vehicle (UAV) visual navigation relies critically on robust image matching between UAV-acquired aerial imagery and pre-existing satellite reference maps. However, extreme cross-domain heterogeneity—encompassing temporal, radiometric, viewpoint, and sensor variations—causes severe performance degradation in existing deep learning-based matchers trained on conventional benchmarks. [...] Read more.
Unmanned aerial vehicle (UAV) visual navigation relies critically on robust image matching between UAV-acquired aerial imagery and pre-existing satellite reference maps. However, extreme cross-domain heterogeneity—encompassing temporal, radiometric, viewpoint, and sensor variations—causes severe performance degradation in existing deep learning-based matchers trained on conventional benchmarks. Furthermore, manual annotation of ground-truth correspondences is prohibitively expensive. This paper proposes a semi-supervised saliency-aware image matching framework with bi-level meta-learning. Our approach comprises two synergistic stages: (1) automated dense correspondence generation via parameterized geometric synthesis, which constructs a large-scale coarse dataset Dc (approximately 50,000 pairs) without dense manual point annotation, serving as the primary training corpus for the feature matching network; (2) expert-validated meta-data curation producing a high-quality meta-dataset Dm (500 pairs) that supervises the training of a Saliency Judgment Network through bi-level meta-optimization, enabling the network to identify and prioritize geometrically reliable correspondences. Experimental results on the proposed RS-Hetero-50K benchmark and cross-domain FuJian-Mountain dataset demonstrate substantial improvements over representative sparse and detector-free matchers, including LoFTR, SuperGlue, and LightGlue. The complete CNN-attention and saliency-aware framework achieves 95.4% matching precision, which is consistent with the best result reported in the experimental section. The plug-and-play experiments further confirm that the proposed saliency module consistently improves representative sparse and detector-free matchers, indicating that the performance gain stems from both stronger feature representation and saliency-guided correspondence selection. The largest terrain-specific gain is observed in gobi scenes, where the AUC@5 px improves by 16.8% relative to the LoFTR baseline, demonstrating improved robustness in weakly textured remote sensing environments. Full article
Show Figures

Figure 1

36 pages, 32050 KB  
Article
Semantic Segmentation of Pegmatite Dikes in High-Resolution Remote Sensing Imagery Using GAD-UNet++ in the Yilanlike Area, South Tianshan
by Zirui Wu, Chuan Chen, Yuanjun Yu, Yong Tian, Jian Yu and Fang Xia
Remote Sens. 2026, 18(12), 1988; https://doi.org/10.3390/rs18121988 - 15 Jun 2026
Viewed by 209
Abstract
Pegmatite dikes are important prospecting indicators for rare-metal deposits, whereas traditional methods for pegmatite dike identification are constrained by the limited capability of human visual interpretation to capture information from remote sensing imagery, resulting in low identification accuracy and efficiency. In recent years, [...] Read more.
Pegmatite dikes are important prospecting indicators for rare-metal deposits, whereas traditional methods for pegmatite dike identification are constrained by the limited capability of human visual interpretation to capture information from remote sensing imagery, resulting in low identification accuracy and efficiency. In recent years, global research on semantic segmentation of different surface features and remote sensing-based mineral exploration using deep learning methods and high-resolution remote sensing imagery has made significant progress; however, studies on surface-exposed geological bodies such as pegmatite dikes remain highly insufficient. To address the key problem of efficiently identifying pegmatite dikes in remote sensing imagery, this study proposes an improved model based on UNet++, termed GAD-UNet++. In the field of remote sensing geology, this study constructed a pegmatite dike semantic segmentation dataset based on high-resolution RGB imagery by using 0.66 m RGB imagery for visual delineation and ZY1F hyperspectral data for spectral constraint and label refinement; on this basis, semantic segmentation of surface pegmatite dikes in the Yilanlike area of the South Tianshan Mountains, Xinjiang, was conducted using RGB remote sensing image patches as model input. Specifically, because pegmatite dikes are small targets characterized by slender structures, indistinct boundaries, and sparse regional distribution, this study introduced a lightweight feature extraction structure (GhostNetV2) and a long-range dependency attention module (DFC) at the encoder stage, and further incorporated the Coordinate Attention module (CA) to enhance spatial localization and boundary representation of the targets. Finally, focal cross-entropy loss and a deep supervision strategy were adopted to improve the accuracy of semantic information extraction for pegmatite dikes, as well as the training stability and segmentation accuracy under class-imbalance conditions. The results show that the proposed model achieved an mIoU of 93.11% and an F1-score of 94.95% on the test set. Compared with existing semantic segmentation models, the proposed model achieved superior performance in both identification accuracy and computational efficiency for pegmatite dikes. In addition, this study delineated 18 potential pegmatite dike enrichment zones in the Yilanlike area, providing technical support for remote sensing-based rare-metal prospecting and geological interpretation in the study area. Full article
(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)
Show Figures

Figure 1

25 pages, 5937 KB  
Article
CGSTA-Net: A Cross-Domain Generative Prior-Assisted Structure–Texture Adaptive Network for Remote Sensing Image Dehazing
by Xiaoyan Li, Yankun Zhao and Na Niu
Symmetry 2026, 18(6), 1027; https://doi.org/10.3390/sym18061027 - 14 Jun 2026
Viewed by 240
Abstract
Dehazing of images is important for proper interpretation of optical images in remote sensing. However, current dehazing networks tend to have limited receptive field and texture information loss caused by conventional downsampling and complementary cross-domain information not being utilized in dehazing frameworks. In [...] Read more.
Dehazing of images is important for proper interpretation of optical images in remote sensing. However, current dehazing networks tend to have limited receptive field and texture information loss caused by conventional downsampling and complementary cross-domain information not being utilized in dehazing frameworks. In order to cope with these problems, we propose a Cross-domain Generative Prior-assisted Structure–Texture Adaptive Network for remote sensing image dehazing. It is a dual-stream encoder–decoder framework, which enhances the domain-specific information of RGB and generated prior, and then integrates them adaptively for haze-free reconstruction. In order to minimize information loss in downsampling, wavelet pooling is introduced to consider the frequency-aware structural and textural features. Additionally, a Structure–Texture Calibration Block is designed to simultaneously improve the local frequency textures and construct sparse long-range dependencies of structures, so as to achieve better restoration performance under spatially non-uniform haze. To appropriately fuse the various representations from RGB and generated prior images, a Prior-aware Gated Adaptive Fusion module is developed to balance the domain-specific features dynamically and keep the fine details at multi-level feature fusion. Finally, we utilize pixel-level contrastive learning to guide the latent space away from hazy distributions, thus enhancing the discriminability of the features. Extensive experiments on the three datasets, namely RSID, RICE-I and HRSD, demonstrate that CGSTA-Net can effectively restore images under varying haze conditions and significantly outperforms the latest dehazing methods in terms of visual quality and quantitative performance. Specifically, compared with the most effective competitive method, CGSTA-Net increased the PSNR by 22.9% on RSID, by 13.2% on RICE-I, and by 7.2% on HRSD. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

22 pages, 43415 KB  
Article
FSSM: Frequency-Enhanced State Space Modeling with FFT-Based Two-Sided Non-Causal Convolution for Image Dehazing
by Li Zeng and Yinqing Huang
J. Imaging 2026, 12(6), 260; https://doi.org/10.3390/jimaging12060260 - 13 Jun 2026
Viewed by 208
Abstract
Image dehazing is a fundamental visual restoration task for improving visual perception under low-visibility weather conditions, especially in UAV-based remote sensing, traffic monitoring, and surveillance scenarios. Existing convolutional neural networks are effective in local feature extraction but remain limited in long-range dependency modeling, [...] Read more.
Image dehazing is a fundamental visual restoration task for improving visual perception under low-visibility weather conditions, especially in UAV-based remote sensing, traffic monitoring, and surveillance scenarios. Existing convolutional neural networks are effective in local feature extraction but remain limited in long-range dependency modeling, while Transformer-based methods improve global modeling at the cost of high computational complexity. To address these issues, this paper proposes an efficient image-dehazing framework termed FSSM, which integrates frequency-enhanced State Space Modeling with a hierarchical encoder–decoder architecture. Specifically, an FFT-based State Space Block (FFTSSB) is designed to reformulate state propagation as frequency-domain two-sided non-causal convolution, enabling efficient bidirectional global dependency modeling without explicit recursive scanning. Furthermore, a Frequency-Aware Discriminative Enhancement Block (FDEB) is introduced to enhance local textures, edges, and structural details through spatial gating and lightweight block-wise frequency modulation. Based on these two components, a Frequency-Aware State Interaction (FASI) block is constructed to progressively couple global state propagation and local frequency-aware enhancement. Experimental results on the HazyDet dataset demonstrate that FSSM achieves favorable restoration accuracy, structural consistency, and perceptual quality compared with representative dehazing methods. Ablation studies further validate the effectiveness of the proposed two-sided FFT-based state modeling, frequency-aware enhancement, and hierarchical multi-scale design. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)
Show Figures

Figure 1

28 pages, 24246 KB  
Article
Multimodal Prompt Learning for Spatial Reasoning in Remote Sensing Image Scene
by Yan Ren, Haizhong Qian, Bingchuan Jiang, Tingting Li, Xiao Wang, Long Sun and Li Yang
Remote Sens. 2026, 18(12), 1959; https://doi.org/10.3390/rs18121959 - 12 Jun 2026
Viewed by 232
Abstract
A remote sensing scene graph (RSSG) enables machines to interpret interactions among ground objects in remote sensing images and supports semantic reasoning and description, thus making it a fundamental technique in the field. However, most existing scene reasoning approaches cannot fully utilize multimodal [...] Read more.
A remote sensing scene graph (RSSG) enables machines to interpret interactions among ground objects in remote sensing images and supports semantic reasoning and description, thus making it a fundamental technique in the field. However, most existing scene reasoning approaches cannot fully utilize multimodal information, resulting in limited performance when inferring spatial relationships among ground objects. To this end, we propose a Unified Visual-Semantic Triple Prompt Learning (UVSTPL) framework, which integrates visual features with matched geospatial object labels, leverages a prompt learning module for multimodal feature extraction, and employs a refined UVTransE model to predict spatial relationships. The core principle of UVSTPL is to enhance semantic feature extraction and improve relationship prediction performance via the collaborative fusion of visual and linguistic modalities. To strengthen the model’s ability to reason about the spatial relationships among ground objects in images, a novel Geo-RSSG dataset is constructed, which includes precise annotations of geographic entities, spatial relationships, and attributes. Extensive experiments demonstrate that the proposed UVSTPL method outperforms benchmark models on the spatial relationship prediction task. In comparison with the best baseline method, our approach improves prediction precision by 1.85%, mean precision by 8.49%, mean recall by 17.46%, and mean F1-score by 12.97%. This study offers valuable insights for advancing the understanding and cognitive capabilities of remote sensing scenes. Full article
Show Figures

Figure 1

29 pages, 8856 KB  
Article
High-Accuracy Indoor Multiple-Extended-Target Tracking Algorithm Based on 60 GHz Millimeter-Wave Radar
by Bo Gao, Jianzhong Chen, Bo Huang and Geng Yang
Sensors 2026, 26(12), 3758; https://doi.org/10.3390/s26123758 - 12 Jun 2026
Viewed by 147
Abstract
The rapid development of Internet of Things technologies has accelerated the deployment of smart home systems. However, perception solutions based on visual sensors remain constrained by illumination sensitivity, occlusion, and privacy concerns. Frequency-modulated continuous-wave (FMCW) millimeter-wave radar provides a promising alternative because it [...] Read more.
The rapid development of Internet of Things technologies has accelerated the deployment of smart home systems. However, perception solutions based on visual sensors remain constrained by illumination sensitivity, occlusion, and privacy concerns. Frequency-modulated continuous-wave (FMCW) millimeter-wave radar provides a promising alternative because it operates independently of lighting conditions, is robust to environmental changes, and preserves user privacy. To address multiple-extended-target tracking in cluttered indoor environments, this paper proposes a high-accuracy tracking algorithm that combines an improved Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, an optimized Nearest-Neighbor Data Association (NNDA) scheme, and an Extended Kalman Filter (EKF). The improved DBSCAN algorithm introduces spatial-extent constraints, velocity-consistency checks, and candidate-cluster validation to cluster raw radar point clouds and convert extended targets into representative point targets with little additional computational cost. The optimized NNDA scheme then integrates clustering information into the association process, improving the matching accuracy between existing tracks and current measurements. Finally, the EKF estimates the state of each target from the associated measurements. Real-world experiments show that the proposed algorithm achieves tracking errors below 0.4 m in typical motion scenarios, maintains continuous tracking in two-person crossing scenarios, and reaches 93.3% counting accuracy in five-person scenarios. These results outperform the tracking system based on the commercial Texas Instruments (TI) IWR6843ISK millimeter-wave radar evaluation board. The proposed method offers a reliable and privacy-preserving sensing solution for smart homes, elderly care, and intelligent building applications. Full article
(This article belongs to the Special Issue Advances in GNSS/INS Integration for Navigation and Positioning)
Show Figures

Figure 1

29 pages, 7128 KB  
Article
EdgeElderCare: A Resource-Aware, Scene-Adaptive Edge-Cloud Collaborative System for Long-Term Elderly Safety and Health Monitoring
by Lihao Luo, Yuting Li, Lin Wei, Di Han, Ruifeng Cao, Bo Chen, Yuechen Pan and Yunfan Chen
Electronics 2026, 15(12), 2601; https://doi.org/10.3390/electronics15122601 - 12 Jun 2026
Viewed by 169
Abstract
Driven by global population aging, long-term in-home and institutional elderly care faces challenges in delivering continuous, privacy-aware, and resource-efficient safety and health monitoring. Existing edge-based solutions struggle to jointly balance detection accuracy, privacy, and resource overhead during continuous operation, and often have limited [...] Read more.
Driven by global population aging, long-term in-home and institutional elderly care faces challenges in delivering continuous, privacy-aware, and resource-efficient safety and health monitoring. Existing edge-based solutions struggle to jointly balance detection accuracy, privacy, and resource overhead during continuous operation, and often have limited situational awareness and inflexible management. We propose EdgeElderCare, a resource-aware, scene-adaptive edge-cloud collaborative system for continuous elderly safety and health monitoring. Its contributions are threefold: (1) a scene-adaptive multi-sensor task-sharing architecture that deploys vision-based fall detection in public areas and privacy-aware millimeter-wave radar in private spaces. Combined with edge-side task scheduling, it provides spatially complementary coverage of public and private areas, mitigates the accuracy–privacy conflict, and reduces computing and bandwidth consumption relative to data-level fusion; (2) a lightweight myocardial infarction detection module deployed on an edge platform, enabling local ECG analysis with low resource overhead; (3) a 3D digital-twin edge-cloud management platform that maps multi-source sensing data to a virtual scene in real time and supports hierarchical visual alerting. Experiments in a real nursing home environment show that the system operated stably on resource-constrained edge hardware: UWB positioning achieved centimeter-level RMSE, visual fall detection reached a recall of 0.90, millimeter-wave radar fall detection achieved accuracy, and F1 above 0.90, and myocardial infarction detection exceeded 0.99 accuracy on the public PTB/PTB-XL benchmark. These results indicate an engineering-feasible approach to intelligent elderly care. Larger-scale and longer-term validation remains the focus of future work. Full article
Show Figures

Figure 1

30 pages, 6714 KB  
Article
Study on a Method for Identifying Particles Causing High-Speed Fluid Wear Based on Multi-Source Information Fusion
by Long Feng, Zhiyu Xiang, Junming Liu, Feng Zhu, Zhenzhen Zhang and Hongxin Xu
Processes 2026, 14(12), 1918; https://doi.org/10.3390/pr14121918 (registering DOI) - 12 Jun 2026
Viewed by 192
Abstract
Mechanical Wear particle recognition is an important approach for equipment health monitoring and fault early warning. However, flow-field disturbances and high-speed particle motion in high-speed fluid environments can lead to image degradation, non-stationary electrostatic signals, and insufficient reliability of single-source recognition methods. Therefore, [...] Read more.
Mechanical Wear particle recognition is an important approach for equipment health monitoring and fault early warning. However, flow-field disturbances and high-speed particle motion in high-speed fluid environments can lead to image degradation, non-stationary electrostatic signals, and insufficient reliability of single-source recognition methods. Therefore, this study proposes a wear particle recognition method based on multi-source information fusion for high-speed fluid environments. The method establishes a multi-scale electrostatic sensing model to characterize the coupling relationship among particle material properties, motion states, and electrostatic response characteristics. Empirical mode decomposition and independent component analysis are combined for adaptive electrostatic signal denoising, and a Transformer network is used to extract multi-domain features. Meanwhile, an ECA-CNN model with an efficient channel attention mechanism is introduced to enhance the feature representation of degraded particle images. On this basis, a meta-learning-based sample-adaptive decision fusion framework is developed to achieve dynamic and complementary fusion of electrostatic and visual information. The experimental results demonstrate that the proposed method exhibits excellent recognition accuracy and robustness in the tested high-speed fluid environment of 10 m/s, achieving a fusion recognition accuracy of 96.0%, which is significantly superior to single-source recognition methods. Ablation experiments further show that removing the global scaling factor, guidance loss, interpolation loss, and category-specific weight generator decreases the average recognition accuracy by 0.7%, 1.2%, 0.4%, and 1.8%, respectively, confirming the contribution of each key module to fusion recognition performance. These findings provide a new technical approach for the online intelligent recognition of wear particles under high-speed fluid conditions and offer theoretical support and methodological guidance for condition monitoring, health assessment, and intelligent operation and maintenance of large-scale equipment. Full article
(This article belongs to the Section Process Control, Modeling and Optimization)
Show Figures

Figure 1

18 pages, 2729 KB  
Article
A Hybrid Swin–Mamba UNet for Post-Disaster Building Damage Assessment
by Tian Zhou, Liwei Deng and Fei Chen
Appl. Sci. 2026, 16(12), 5918; https://doi.org/10.3390/app16125918 - 11 Jun 2026
Viewed by 121
Abstract
Natural disasters frequently cause significant building damage, necessitating timely and accurate damage assessment for effective rescue operations and post-disaster reconstruction. Traditional building damage assessment methods commonly rely on paired pre- and post-disaster remote sensing images, which often face practical challenges in data acquisition [...] Read more.
Natural disasters frequently cause significant building damage, necessitating timely and accurate damage assessment for effective rescue operations and post-disaster reconstruction. Traditional building damage assessment methods commonly rely on paired pre- and post-disaster remote sensing images, which often face practical challenges in data acquisition and image pairing during emergency situations. To overcome these limitations, a hybrid swin–mamba U-shaped network (UNet) is developed for building damage assessment using only post-disaster remote sensing imagery. The proposed framework employs a Swin Transformer as the encoder to extract multi-scale features and capture long-range contextual information, while a Parallelized Patch-Aware Attention (PPA) convolution module is introduced in the decoder to restore spatial details and improve feature reconstruction. In addition, a Visual State Space (VSS) module is incorporated in the bottleneck layer to effectively model both global contextual dependencies and local structural information, thereby improving the representation of building damage characteristics from single-temporal imagery. Experiments conducted on the xBD dataset show that the proposed method outperforms the Swin–Unet by 1.7% in overall F1-score, achieving an overall F1-score of 55.2%. In addition, qualitative visualization results suggest that the proposed method has favorable generalization capability across different disaster scenarios. These results highlight the practical potential of the proposed framework for rapid post-disaster building damage assessment, particularly in emergency response scenarios where only post-disaster imagery is available. Full article
Show Figures

Figure 1

33 pages, 6102 KB  
Article
From Detection Toward Decision Support: A Hierarchical Visual–Sensor Framework for Zamioculcas Monitoring in Indoor Environments
by Raikhan Amanova, Baurzhan Belgibayev, Yersaiyn Mailybayev, Gulnur Kazbekova, Zhadyra Akanova, Galiya Mamankyzy, Marzhana Amanova, Artem Bykov, Periuza Pirniyazova and Nurzhigit Smailov
Computers 2026, 15(6), 382; https://doi.org/10.3390/computers15060382 - 11 Jun 2026
Viewed by 173
Abstract
This paper proposes a prototype-level hierarchical visual–sensor framework for monitoring the Zamioculcas houseplant in complex indoor environments and supporting adaptive care-mode selection. The proposed framework combines a two-level visual pipeline, consisting of YOLO-based target plant detection and MobileViT-S-based leaf-condition classification, with a Plant [...] Read more.
This paper proposes a prototype-level hierarchical visual–sensor framework for monitoring the Zamioculcas houseplant in complex indoor environments and supporting adaptive care-mode selection. The proposed framework combines a two-level visual pipeline, consisting of YOLO-based target plant detection and MobileViT-S-based leaf-condition classification, with a Plant Health Index (PHI) and a rule-based decision-support module for integrating visual and IoT-derived indicators. For the detection task, YOLOv8, YOLO12, and YOLO26 were compared, with YOLO26 showing the most balanced performance among the evaluated implementations. To improve robustness in real indoor scenes, negative training samples were added; this reduced the image-level false alarm rate on an independent negative-scene test set from 50.7% to 10.0% and increased specificity from 49.3% to 90.0%. For the second visual level, MobileViT-S achieved an accuracy of 0.9857 and an F1-score of 0.9857 on the independent cropped leaf test subset. To reduce the dependence of this result on a single data split, an additional 5-fold cross-validation experiment was conducted on the full cropped leaf dataset of 847 images, resulting in an accuracy of 0.9858 ± 0.0068 and an F1-score of 0.9853 ± 0.0070. To further address plant-level generalization, an additional unseen-plant validation subset of 60 newly collected cropped leaf images was evaluated, and MobileViT-S achieved an accuracy of 0.9500 and an F1-score of 0.9499. These results support the stability of the leaf-condition classifier within the available data, although larger external validation with strict plant-level and session-level separation remains necessary. In addition, an Arduino-based module-level validation was conducted using a capacitive soil-moisture sensor to verify the proposed sensor-based and Vision–IoT decision rules. The experiment demonstrated that the rule-based layer can distinguish dry, normal, and wet soil states and select conservative care actions depending on both soil moisture and visual-condition input. A brief real-time camera–sensor communication test further confirmed that live camera input, Arduino-based soil-moisture sensing, PHI computation, and care-mode selection can be connected within one decision-support pipeline. The proposed PHI and care-mode selection module are therefore presented as a formalized decision-support layer rather than as a fully validated autonomous irrigation system. Further calibration, actuator integration, and closed-loop validation remain necessary before practical autonomous deployment. Full article
(This article belongs to the Section Internet of Things (IoT) and Industrial IoT)
Show Figures

Figure 1

23 pages, 12407 KB  
Article
ADS-MIR: A Machine Perception-Oriented Visible-Infrared Sensor Fusion Framework for Intelligent Transportation Perception Under Complex Illumination Conditions
by Jun Yang, Jianguo Wu, Xiaolan Zhang, Zenglong Yang, Hongfei Shen, Botao Shen and Chang Zeng
Sensors 2026, 26(12), 3675; https://doi.org/10.3390/s26123675 - 9 Jun 2026
Viewed by 279
Abstract
Multimodal sensor fusion in intelligent transportation systems faces severe challenges in maintaining reliable visual information acquisition under complex illumination conditions. Extreme low-light and intense glare significantly degrade visible-light sensor imaging quality, making it difficult for single-modal vision systems to maintain reliable target perception. [...] Read more.
Multimodal sensor fusion in intelligent transportation systems faces severe challenges in maintaining reliable visual information acquisition under complex illumination conditions. Extreme low-light and intense glare significantly degrade visible-light sensor imaging quality, making it difficult for single-modal vision systems to maintain reliable target perception. Meanwhile, although infrared sensors provide a relatively stable saliency complement for target regions, modal discrepancies and spatial misalignment between heterogeneous visible and infrared sensors often degrade fusion performance, limiting the practical benefits of multimodal sensing for machine perception. To address these issues, this study proposes Aligned, Dual-Gated, and Saliency-Guided MIRNet (ADS-MIR), a machine perception-oriented visible-infrared sensor fusion framework that enhances the discriminability and structural representation of target regions for roadside perception sensors operating under complex conditions. Specifically, the framework employs a domain alignment layer to mitigate feature distribution discrepancies and spatial misalignment between heterogeneous sensor modalities. An illumination-guided adaptive gating mechanism dynamically modulates bimodal sensor feature contributions, while a saliency-guided frequency decoupling reinforcement strategy reinforces target-related high-frequency edge details. Experimental results on the LLVIP and M3FD datasets demonstrate that ADS-MIR improves the edge information transfer factor (QAB/F) by 49.6% to 111.6% compared with existing methods, highlighting its distinct advantage in preserving target contours and restoring edge information. Furthermore, the enhanced results provide more discriminative input features for downstream object detection, exhibiting more stable perception capabilities under complex illumination and challenging sensing scenarios. Full article
(This article belongs to the Section Optical Sensors)
Show Figures

Figure 1

23 pages, 3094 KB  
Article
A Camera-Based Visual Sensor Pipeline for Fine-Grained Human Activity Recognition in Classroom Scenes
by Cheng Sun, Danning Wu, Zihao Wu, Weibing Zhou and Jin Zhang
Sensors 2026, 26(12), 3666; https://doi.org/10.3390/s26123666 - 8 Jun 2026
Viewed by 332
Abstract
Student behavior recognition in classroom environments is important for teaching quality assessment and intelligent education, yet it remains challenging due to dense student distributions, frequent occlusion, substantial scale variation, and the subtle nature of common classroom activities. To address these issues, this paper [...] Read more.
Student behavior recognition in classroom environments is important for teaching quality assessment and intelligent education, yet it remains challenging due to dense student distributions, frequent occlusion, substantial scale variation, and the subtle nature of common classroom activities. To address these issues, this paper proposes RepYOLOv5-SF3D, a cascaded visual perception framework for fine-grained student behavior recognition in complex classroom scenes. The framework integrates a lightweight RepYOLOv5m detector with a dual-stream SlowFast-3D recognition branch, enabling automated inference from raw video input to behavior labels. To improve robustness in dense and occluded scenes, the front-end detector serves as a spatial-prior module, while a decoupled training strategy reduces the impact of localization instability on back-end spatiotemporal learning. In addition, two task-oriented modules are introduced in the recognition branch: the Spatiotemporal Depthwise-Separable 3D module (SDS3D) and the Normalization-Based Temporal Attention Mechanism (NTAM). Experimental results on a real classroom dataset show that RepYOLOv5-SF3D achieves a mean average precision (mAP) of 88.83%, outperforming the baseline SlowFast model by 3.36% and surpassing the existing LSTC method by 2.05%, while maintaining a front-end inference latency of 12.5 ms per frame and a total model size of 151.46 MB. These results demonstrate a favorable balance between fine-grained recognition accuracy and edge-deployment efficiency in practical classroom visual sensing. Full article
(This article belongs to the Special Issue Sensors for Human Activity Recognition: 3rd Edition)
Show Figures

Figure 1

Back to TopTop