MDPI - Publisher of Open Access Journals

21 pages, 1456 KB

Open AccessArticle

A Camera-Based Multimodal Defect Sensing Framework for Substation Equipment Monitoring via Cross-Modal Feature Mapping

by Ziquan Liu, Hai Xue, Chengbo Hu, Chao Wei and Can Zhang

Sensors 2026, 26(12), 3935; https://doi.org/10.3390/s26123935 (registering DOI) - 21 Jun 2026

Viewed by 126

To address the limitations of vision-only defect detection, image–semantic misalignment, and spatial-logic conflicts in complex substation inspection scenarios, this paper proposes a camera-sensor-based multimodal defect sensing framework with cross-modal feature mapping for substation equipment monitoring. The proposed framework integrates field inspection images acquired [...] Read more.

To address the limitations of vision-only defect detection, image–semantic misalignment, and spatial-logic conflicts in complex substation inspection scenarios, this paper proposes a camera-sensor-based multimodal defect sensing framework with cross-modal feature mapping for substation equipment monitoring. The proposed framework integrates field inspection images acquired by camera sensors, defect textual descriptions, and equipment topology knowledge and establishes a unified domain-adaptive pre-training–bidirectional cross-modal mapping–hierarchical reasoning workflow. First, a Contrastive Language–Image Pre-training (CLIP)-based domain-adaptive pre-training strategy is developed to enhance the representation of equipment categories, defect attributes, and inspection-scene semantics. Second, a bidirectional cross-modal feature mapping network is constructed to model fine-grained interactions between candidate visual regions and textual semantics, where uncertainty-aware fusion and prototype constraints are introduced to improve semantic alignment and defect discrimination. Third, a hierarchical neuro-symbolic reasoning module incorporates equipment topology and spatial rules for posterior verification, logical consistency checking, and false-positive suppression. Experiments on a substation inspection image dataset demonstrate that the proposed method achieves 90.8% mAP@0.5, 68.7% mAP@0.5:0.95, and 89.4% F1-score, outperforming mainstream and recent detection models. Full article

(This article belongs to the Special Issue Advanced Sensing Technologies for Grid Monitoring, Protection, and Control)

26 pages, 3882 KB

Open AccessArticle

Remote Sensing Small Object Detection Network Based on Wavelet-Convolution and Fine-Grained Preservation

by Hangyu Li and Tiecheng Song

Information 2026, 17(6), 609; https://doi.org/10.3390/info17060609 (registering DOI) - 18 Jun 2026

Viewed by 171

Abstract

Small object detection in remote sensing imagery is a fundamental task for visual information extraction, yet it remains challenging due to extremely small target scales, complex backgrounds, and the loss of discriminative feature information caused by repeated downsampling. To address these issues, this [...] Read more.

Small object detection in remote sensing imagery is a fundamental task for visual information extraction, yet it remains challenging due to extremely small target scales, complex backgrounds, and the loss of discriminative feature information caused by repeated downsampling. To address these issues, this paper proposes a Wavelet-Convolution and Fine-Grained Preservation Network (WCFPNet) based on YOLOv8n. Specifically, a Wavelet-Convolution Module (WCM) is introduced into the backbone to decompose feature maps into low- and high-frequency sub-bands, thereby enhancing structural feature modeling and preserving subtle target details. To compensate for the weakened fine-grained information after repeated downsampling, an Enhanced Spatial Pyramid Pooling-Fast (ESPPF) module is embedded at the end of the backbone to strengthen multi-scale contextual aggregation. In addition, an Enhanced Feature Pyramid Network (EFPN) is designed in the neck to facilitate the propagation of shallow and intermediate fine-grained features to high-level semantic features through cross-level fusion and the Convolutional Block Attention Module (CBAM). Experiments on the NWPU VHR-10 dataset show that WCFPNet achieves 0.879 mAP@0.5 and 0.515 mAP@0.5:0.95, outperforming YOLOv8n by 1.7 and 2.5 percentage points, respectively. Moreover, the proposed WCFPNet achieves a competitive performance compared with several representative detectors while maintaining moderate model complexity. These results demonstrate the effectiveness of WCFPNet in challenging remote sensing scenes characterized by complex backgrounds, dense object distributions, and weak textures. Full article

(This article belongs to the Special Issue Emerging Research in Target Detection and Recognition in Remote Sensing Images, 2nd Edition)

► Show Figures

Figure 1

35 pages, 9814 KB

Open AccessArticle

EO2SAR-Diff: Structure-Aware Latent Diffusion for Unpaired EO-to-SAR Translation

by Yeon-Wook Kim and Kiyoung Kim

Remote Sens. 2026, 18(12), 2037; https://doi.org/10.3390/rs18122037 - 18 Jun 2026

Viewed by 209

Abstract

Synthetic aperture radar (SAR) imagery provides all-weather, day-and-night observation capabilities that complement electro-optical (EO) imaging; however, the limited number of operational SAR satellites and the difficulty of acquiring expert-annotated SAR datasets constrain deep-learning-based SAR image analysis. In this paper, we propose EO2SAR-Diff, a [...] Read more.

Synthetic aperture radar (SAR) imagery provides all-weather, day-and-night observation capabilities that complement electro-optical (EO) imaging; however, the limited number of operational SAR satellites and the difficulty of acquiring expert-annotated SAR datasets constrain deep-learning-based SAR image analysis. In this paper, we propose EO2SAR-Diff, a conditional latent diffusion framework that translates EO aerial images into realistic synthetic SAR images. The framework comprises three core components: (1) domain-adaptive LoRA pre-training that anchors the Stable Diffusion backbone in the remote sensing domain, (2) a style extraction and injection network that captures SAR-specific visual characteristics via multi-scale feature encoding and parallel cross-attention, and (3) a multi-branch ControlNet with three parallel branches for complementary structural guidance. These components are coordinated by a dual-axis feature injection strategy that modulates conditioning strength along both spatial (per-block) and temporal (per-timestep) dimensions. Experiments on the DOTA 1.0 and SARDet-100K datasets demonstrate that EO2SAR-Diff ranks in the top tier among all compared methods in distributional alignment with real SAR imagery, in terms of FID and KID computed with two SAR-domain-adapted feature extractors. Augmenting the SAR training set with our synthetic images yields consistent improvements in downstream object detection performance, confirming the practical utility of the proposed framework. Full article

(This article belongs to the Special Issue AI-Driven Remote Sensing Image Restoration and Generation)

► Show Figures

Figure 1

30 pages, 42422 KB

Open AccessArticle

Bi-Level Meta-Learning for Reliable Remote Sensing Image Registration

by Lin Shi, Renzhen Wang, Xiaofeng Zhu, Cong An, Kai Zhao, Jun Shu, Dongfang Yang and Deyu Meng

Remote Sens. 2026, 18(12), 2007; https://doi.org/10.3390/rs18122007 - 16 Jun 2026

Viewed by 128

Abstract

Unmanned aerial vehicle (UAV) visual navigation relies critically on robust image matching between UAV-acquired aerial imagery and pre-existing satellite reference maps. However, extreme cross-domain heterogeneity—encompassing temporal, radiometric, viewpoint, and sensor variations—causes severe performance degradation in existing deep learning-based matchers trained on conventional benchmarks. [...] Read more.

Unmanned aerial vehicle (UAV) visual navigation relies critically on robust image matching between UAV-acquired aerial imagery and pre-existing satellite reference maps. However, extreme cross-domain heterogeneity—encompassing temporal, radiometric, viewpoint, and sensor variations—causes severe performance degradation in existing deep learning-based matchers trained on conventional benchmarks. Furthermore, manual annotation of ground-truth correspondences is prohibitively expensive. This paper proposes a semi-supervised saliency-aware image matching framework with bi-level meta-learning. Our approach comprises two synergistic stages: (1) automated dense correspondence generation via parameterized geometric synthesis, which constructs a large-scale coarse dataset

D_{c}

(approximately 50,000 pairs) without dense manual point annotation, serving as the primary training corpus for the feature matching network; (2) expert-validated meta-data curation producing a high-quality meta-dataset

D_{m}

(500 pairs) that supervises the training of a Saliency Judgment Network through bi-level meta-optimization, enabling the network to identify and prioritize geometrically reliable correspondences. Experimental results on the proposed RS-Hetero-50K benchmark and cross-domain FuJian-Mountain dataset demonstrate substantial improvements over representative sparse and detector-free matchers, including LoFTR, SuperGlue, and LightGlue. The complete CNN-attention and saliency-aware framework achieves 95.4% matching precision, which is consistent with the best result reported in the experimental section. The plug-and-play experiments further confirm that the proposed saliency module consistently improves representative sparse and detector-free matchers, indicating that the performance gain stems from both stronger feature representation and saliency-guided correspondence selection. The largest terrain-specific gain is observed in gobi scenes, where the AUC@5 px improves by 16.8% relative to the LoFTR baseline, demonstrating improved robustness in weakly textured remote sensing environments. Full article

► Show Figures

Figure 1

36 pages, 32050 KB

Open AccessArticle

Semantic Segmentation of Pegmatite Dikes in High-Resolution Remote Sensing Imagery Using GAD-UNet++ in the Yilanlike Area, South Tianshan

by Zirui Wu, Chuan Chen, Yuanjun Yu, Yong Tian, Jian Yu and Fang Xia

Remote Sens. 2026, 18(12), 1988; https://doi.org/10.3390/rs18121988 - 15 Jun 2026

Viewed by 209

Abstract

Pegmatite dikes are important prospecting indicators for rare-metal deposits, whereas traditional methods for pegmatite dike identification are constrained by the limited capability of human visual interpretation to capture information from remote sensing imagery, resulting in low identification accuracy and efficiency. In recent years, [...] Read more.

Pegmatite dikes are important prospecting indicators for rare-metal deposits, whereas traditional methods for pegmatite dike identification are constrained by the limited capability of human visual interpretation to capture information from remote sensing imagery, resulting in low identification accuracy and efficiency. In recent years, global research on semantic segmentation of different surface features and remote sensing-based mineral exploration using deep learning methods and high-resolution remote sensing imagery has made significant progress; however, studies on surface-exposed geological bodies such as pegmatite dikes remain highly insufficient. To address the key problem of efficiently identifying pegmatite dikes in remote sensing imagery, this study proposes an improved model based on UNet++, termed GAD-UNet++. In the field of remote sensing geology, this study constructed a pegmatite dike semantic segmentation dataset based on high-resolution RGB imagery by using 0.66 m RGB imagery for visual delineation and ZY1F hyperspectral data for spectral constraint and label refinement; on this basis, semantic segmentation of surface pegmatite dikes in the Yilanlike area of the South Tianshan Mountains, Xinjiang, was conducted using RGB remote sensing image patches as model input. Specifically, because pegmatite dikes are small targets characterized by slender structures, indistinct boundaries, and sparse regional distribution, this study introduced a lightweight feature extraction structure (GhostNetV2) and a long-range dependency attention module (DFC) at the encoder stage, and further incorporated the Coordinate Attention module (CA) to enhance spatial localization and boundary representation of the targets. Finally, focal cross-entropy loss and a deep supervision strategy were adopted to improve the accuracy of semantic information extraction for pegmatite dikes, as well as the training stability and segmentation accuracy under class-imbalance conditions. The results show that the proposed model achieved an mIoU of 93.11% and an F1-score of 94.95% on the test set. Compared with existing semantic segmentation models, the proposed model achieved superior performance in both identification accuracy and computational efficiency for pegmatite dikes. In addition, this study delineated 18 potential pegmatite dike enrichment zones in the Yilanlike area, providing technical support for remote sensing-based rare-metal prospecting and geological interpretation in the study area. Full article

(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)

► Show Figures

Figure 1

25 pages, 5937 KB

Open AccessArticle

CGSTA-Net: A Cross-Domain Generative Prior-Assisted Structure–Texture Adaptive Network for Remote Sensing Image Dehazing

by Xiaoyan Li, Yankun Zhao and Na Niu

Symmetry 2026, 18(6), 1027; https://doi.org/10.3390/sym18061027 - 14 Jun 2026

Viewed by 240

Abstract

Dehazing of images is important for proper interpretation of optical images in remote sensing. However, current dehazing networks tend to have limited receptive field and texture information loss caused by conventional downsampling and complementary cross-domain information not being utilized in dehazing frameworks. In [...] Read more.

Dehazing of images is important for proper interpretation of optical images in remote sensing. However, current dehazing networks tend to have limited receptive field and texture information loss caused by conventional downsampling and complementary cross-domain information not being utilized in dehazing frameworks. In order to cope with these problems, we propose a Cross-domain Generative Prior-assisted Structure–Texture Adaptive Network for remote sensing image dehazing. It is a dual-stream encoder–decoder framework, which enhances the domain-specific information of RGB and generated prior, and then integrates them adaptively for haze-free reconstruction. In order to minimize information loss in downsampling, wavelet pooling is introduced to consider the frequency-aware structural and textural features. Additionally, a Structure–Texture Calibration Block is designed to simultaneously improve the local frequency textures and construct sparse long-range dependencies of structures, so as to achieve better restoration performance under spatially non-uniform haze. To appropriately fuse the various representations from RGB and generated prior images, a Prior-aware Gated Adaptive Fusion module is developed to balance the domain-specific features dynamically and keep the fine details at multi-level feature fusion. Finally, we utilize pixel-level contrastive learning to guide the latent space away from hazy distributions, thus enhancing the discriminability of the features. Extensive experiments on the three datasets, namely RSID, RICE-I and HRSD, demonstrate that CGSTA-Net can effectively restore images under varying haze conditions and significantly outperforms the latest dehazing methods in terms of visual quality and quantitative performance. Specifically, compared with the most effective competitive method, CGSTA-Net increased the PSNR by 22.9% on RSID, by 13.2% on RICE-I, and by 7.2% on HRSD. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

22 pages, 43415 KB

Open AccessArticle

FSSM: Frequency-Enhanced State Space Modeling with FFT-Based Two-Sided Non-Causal Convolution for Image Dehazing

by Li Zeng and Yinqing Huang

J. Imaging 2026, 12(6), 260; https://doi.org/10.3390/jimaging12060260 - 13 Jun 2026

Viewed by 208

Abstract

Image dehazing is a fundamental visual restoration task for improving visual perception under low-visibility weather conditions, especially in UAV-based remote sensing, traffic monitoring, and surveillance scenarios. Existing convolutional neural networks are effective in local feature extraction but remain limited in long-range dependency modeling, [...] Read more.

Image dehazing is a fundamental visual restoration task for improving visual perception under low-visibility weather conditions, especially in UAV-based remote sensing, traffic monitoring, and surveillance scenarios. Existing convolutional neural networks are effective in local feature extraction but remain limited in long-range dependency modeling, while Transformer-based methods improve global modeling at the cost of high computational complexity. To address these issues, this paper proposes an efficient image-dehazing framework termed FSSM, which integrates frequency-enhanced State Space Modeling with a hierarchical encoder–decoder architecture. Specifically, an FFT-based State Space Block (FFTSSB) is designed to reformulate state propagation as frequency-domain two-sided non-causal convolution, enabling efficient bidirectional global dependency modeling without explicit recursive scanning. Furthermore, a Frequency-Aware Discriminative Enhancement Block (FDEB) is introduced to enhance local textures, edges, and structural details through spatial gating and lightweight block-wise frequency modulation. Based on these two components, a Frequency-Aware State Interaction (FASI) block is constructed to progressively couple global state propagation and local frequency-aware enhancement. Experimental results on the HazyDet dataset demonstrate that FSSM achieves favorable restoration accuracy, structural consistency, and perceptual quality compared with representative dehazing methods. Ablation studies further validate the effectiveness of the proposed two-sided FFT-based state modeling, frequency-aware enhancement, and hierarchical multi-scale design. Full article

(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)

► Show Figures

Figure 1

28 pages, 24246 KB

Open AccessArticle

Multimodal Prompt Learning for Spatial Reasoning in Remote Sensing Image Scene

by Yan Ren, Haizhong Qian, Bingchuan Jiang, Tingting Li, Xiao Wang, Long Sun and Li Yang

Remote Sens. 2026, 18(12), 1959; https://doi.org/10.3390/rs18121959 - 12 Jun 2026

Viewed by 232

Abstract

A remote sensing scene graph (RSSG) enables machines to interpret interactions among ground objects in remote sensing images and supports semantic reasoning and description, thus making it a fundamental technique in the field. However, most existing scene reasoning approaches cannot fully utilize multimodal [...] Read more.

A remote sensing scene graph (RSSG) enables machines to interpret interactions among ground objects in remote sensing images and supports semantic reasoning and description, thus making it a fundamental technique in the field. However, most existing scene reasoning approaches cannot fully utilize multimodal information, resulting in limited performance when inferring spatial relationships among ground objects. To this end, we propose a Unified Visual-Semantic Triple Prompt Learning (UVSTPL) framework, which integrates visual features with matched geospatial object labels, leverages a prompt learning module for multimodal feature extraction, and employs a refined UVTransE model to predict spatial relationships. The core principle of UVSTPL is to enhance semantic feature extraction and improve relationship prediction performance via the collaborative fusion of visual and linguistic modalities. To strengthen the model’s ability to reason about the spatial relationships among ground objects in images, a novel Geo-RSSG dataset is constructed, which includes precise annotations of geographic entities, spatial relationships, and attributes. Extensive experiments demonstrate that the proposed UVSTPL method outperforms benchmark models on the spatial relationship prediction task. In comparison with the best baseline method, our approach improves prediction precision by 1.85%, mean precision by 8.49%, mean recall by 17.46%, and mean F1-score by 12.97%. This study offers valuable insights for advancing the understanding and cognitive capabilities of remote sensing scenes. Full article

(This article belongs to the Special Issue Vision–Language Multimodal Learning for Remote Sensing and Geospatial Artificial Intelligence)

► Show Figures

Figure 1

29 pages, 8856 KB

Open AccessArticle

High-Accuracy Indoor Multiple-Extended-Target Tracking Algorithm Based on 60 GHz Millimeter-Wave Radar

by Bo Gao, Jianzhong Chen, Bo Huang and Geng Yang

Sensors 2026, 26(12), 3758; https://doi.org/10.3390/s26123758 - 12 Jun 2026

Viewed by 147

Abstract

The rapid development of Internet of Things technologies has accelerated the deployment of smart home systems. However, perception solutions based on visual sensors remain constrained by illumination sensitivity, occlusion, and privacy concerns. Frequency-modulated continuous-wave (FMCW) millimeter-wave radar provides a promising alternative because it [...] Read more.

The rapid development of Internet of Things technologies has accelerated the deployment of smart home systems. However, perception solutions based on visual sensors remain constrained by illumination sensitivity, occlusion, and privacy concerns. Frequency-modulated continuous-wave (FMCW) millimeter-wave radar provides a promising alternative because it operates independently of lighting conditions, is robust to environmental changes, and preserves user privacy. To address multiple-extended-target tracking in cluttered indoor environments, this paper proposes a high-accuracy tracking algorithm that combines an improved Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, an optimized Nearest-Neighbor Data Association (NNDA) scheme, and an Extended Kalman Filter (EKF). The improved DBSCAN algorithm introduces spatial-extent constraints, velocity-consistency checks, and candidate-cluster validation to cluster raw radar point clouds and convert extended targets into representative point targets with little additional computational cost. The optimized NNDA scheme then integrates clustering information into the association process, improving the matching accuracy between existing tracks and current measurements. Finally, the EKF estimates the state of each target from the associated measurements. Real-world experiments show that the proposed algorithm achieves tracking errors below 0.4 m in typical motion scenarios, maintains continuous tracking in two-person crossing scenarios, and reaches 93.3% counting accuracy in five-person scenarios. These results outperform the tracking system based on the commercial Texas Instruments (TI) IWR6843ISK millimeter-wave radar evaluation board. The proposed method offers a reliable and privacy-preserving sensing solution for smart homes, elderly care, and intelligent building applications. Full article

(This article belongs to the Special Issue Advances in GNSS/INS Integration for Navigation and Positioning)

► Show Figures

Figure 1

29 pages, 7128 KB

Open AccessArticle

EdgeElderCare: A Resource-Aware, Scene-Adaptive Edge-Cloud Collaborative System for Long-Term Elderly Safety and Health Monitoring

by Lihao Luo, Yuting Li, Lin Wei, Di Han, Ruifeng Cao, Bo Chen, Yuechen Pan and Yunfan Chen

Electronics 2026, 15(12), 2601; https://doi.org/10.3390/electronics15122601 - 12 Jun 2026

Viewed by 169

Abstract

Driven by global population aging, long-term in-home and institutional elderly care faces challenges in delivering continuous, privacy-aware, and resource-efficient safety and health monitoring. Existing edge-based solutions struggle to jointly balance detection accuracy, privacy, and resource overhead during continuous operation, and often have limited [...] Read more.

Driven by global population aging, long-term in-home and institutional elderly care faces challenges in delivering continuous, privacy-aware, and resource-efficient safety and health monitoring. Existing edge-based solutions struggle to jointly balance detection accuracy, privacy, and resource overhead during continuous operation, and often have limited situational awareness and inflexible management. We propose EdgeElderCare, a resource-aware, scene-adaptive edge-cloud collaborative system for continuous elderly safety and health monitoring. Its contributions are threefold: (1) a scene-adaptive multi-sensor task-sharing architecture that deploys vision-based fall detection in public areas and privacy-aware millimeter-wave radar in private spaces. Combined with edge-side task scheduling, it provides spatially complementary coverage of public and private areas, mitigates the accuracy–privacy conflict, and reduces computing and bandwidth consumption relative to data-level fusion; (2) a lightweight myocardial infarction detection module deployed on an edge platform, enabling local ECG analysis with low resource overhead; (3) a 3D digital-twin edge-cloud management platform that maps multi-source sensing data to a virtual scene in real time and supports hierarchical visual alerting. Experiments in a real nursing home environment show that the system operated stably on resource-constrained edge hardware: UWB positioning achieved centimeter-level RMSE, visual fall detection reached a recall of 0.90, millimeter-wave radar fall detection achieved accuracy, and F1 above 0.90, and myocardial infarction detection exceeded 0.99 accuracy on the public PTB/PTB-XL benchmark. These results indicate an engineering-feasible approach to intelligent elderly care. Larger-scale and longer-term validation remains the focus of future work. Full article

(This article belongs to the Special Issue Resource-Aware Edge/on-Device Intelligence for Long-Term Autonomous Mobile Systems)

► Show Figures

Figure 1

30 pages, 6714 KB

Open AccessArticle

Study on a Method for Identifying Particles Causing High-Speed Fluid Wear Based on Multi-Source Information Fusion

by Long Feng, Zhiyu Xiang, Junming Liu, Feng Zhu, Zhenzhen Zhang and Hongxin Xu

Processes 2026, 14(12), 1918; https://doi.org/10.3390/pr14121918 (registering DOI) - 12 Jun 2026

Viewed by 192

Abstract

Mechanical Wear particle recognition is an important approach for equipment health monitoring and fault early warning. However, flow-field disturbances and high-speed particle motion in high-speed fluid environments can lead to image degradation, non-stationary electrostatic signals, and insufficient reliability of single-source recognition methods. Therefore, [...] Read more.

Mechanical Wear particle recognition is an important approach for equipment health monitoring and fault early warning. However, flow-field disturbances and high-speed particle motion in high-speed fluid environments can lead to image degradation, non-stationary electrostatic signals, and insufficient reliability of single-source recognition methods. Therefore, this study proposes a wear particle recognition method based on multi-source information fusion for high-speed fluid environments. The method establishes a multi-scale electrostatic sensing model to characterize the coupling relationship among particle material properties, motion states, and electrostatic response characteristics. Empirical mode decomposition and independent component analysis are combined for adaptive electrostatic signal denoising, and a Transformer network is used to extract multi-domain features. Meanwhile, an ECA-CNN model with an efficient channel attention mechanism is introduced to enhance the feature representation of degraded particle images. On this basis, a meta-learning-based sample-adaptive decision fusion framework is developed to achieve dynamic and complementary fusion of electrostatic and visual information. The experimental results demonstrate that the proposed method exhibits excellent recognition accuracy and robustness in the tested high-speed fluid environment of 10 m/s, achieving a fusion recognition accuracy of 96.0%, which is significantly superior to single-source recognition methods. Ablation experiments further show that removing the global scaling factor, guidance loss, interpolation loss, and category-specific weight generator decreases the average recognition accuracy by 0.7%, 1.2%, 0.4%, and 1.8%, respectively, confirming the contribution of each key module to fusion recognition performance. These findings provide a new technical approach for the online intelligent recognition of wear particles under high-speed fluid conditions and offer theoretical support and methodological guidance for condition monitoring, health assessment, and intelligent operation and maintenance of large-scale equipment. Full article

(This article belongs to the Section Process Control, Modeling and Optimization)

► Show Figures

Figure 1

18 pages, 2729 KB

Open AccessArticle

A Hybrid Swin–Mamba UNet for Post-Disaster Building Damage Assessment

by Tian Zhou, Liwei Deng and Fei Chen

Appl. Sci. 2026, 16(12), 5918; https://doi.org/10.3390/app16125918 - 11 Jun 2026

Viewed by 121

Abstract

Natural disasters frequently cause significant building damage, necessitating timely and accurate damage assessment for effective rescue operations and post-disaster reconstruction. Traditional building damage assessment methods commonly rely on paired pre- and post-disaster remote sensing images, which often face practical challenges in data acquisition [...] Read more.

Natural disasters frequently cause significant building damage, necessitating timely and accurate damage assessment for effective rescue operations and post-disaster reconstruction. Traditional building damage assessment methods commonly rely on paired pre- and post-disaster remote sensing images, which often face practical challenges in data acquisition and image pairing during emergency situations. To overcome these limitations, a hybrid swin–mamba U-shaped network (UNet) is developed for building damage assessment using only post-disaster remote sensing imagery. The proposed framework employs a Swin Transformer as the encoder to extract multi-scale features and capture long-range contextual information, while a Parallelized Patch-Aware Attention (PPA) convolution module is introduced in the decoder to restore spatial details and improve feature reconstruction. In addition, a Visual State Space (VSS) module is incorporated in the bottleneck layer to effectively model both global contextual dependencies and local structural information, thereby improving the representation of building damage characteristics from single-temporal imagery. Experiments conducted on the xBD dataset show that the proposed method outperforms the Swin–Unet by 1.7% in overall F1-score, achieving an overall F1-score of 55.2%. In addition, qualitative visualization results suggest that the proposed method has favorable generalization capability across different disaster scenarios. These results highlight the practical potential of the proposed framework for rapid post-disaster building damage assessment, particularly in emergency response scenarios where only post-disaster imagery is available. Full article

(This article belongs to the Special Issue Remote Sensing Applications in Agricultural, Earth and Environmental Science, 2nd Edition)

► Show Figures

Figure 1

33 pages, 6102 KB

Open AccessArticle

From Detection Toward Decision Support: A Hierarchical Visual–Sensor Framework for Zamioculcas Monitoring in Indoor Environments

by Raikhan Amanova, Baurzhan Belgibayev, Yersaiyn Mailybayev, Gulnur Kazbekova, Zhadyra Akanova, Galiya Mamankyzy, Marzhana Amanova, Artem Bykov, Periuza Pirniyazova and Nurzhigit Smailov

Computers 2026, 15(6), 382; https://doi.org/10.3390/computers15060382 - 11 Jun 2026

Viewed by 173

Abstract

This paper proposes a prototype-level hierarchical visual–sensor framework for monitoring the Zamioculcas houseplant in complex indoor environments and supporting adaptive care-mode selection. The proposed framework combines a two-level visual pipeline, consisting of YOLO-based target plant detection and MobileViT-S-based leaf-condition classification, with a Plant [...] Read more.

This paper proposes a prototype-level hierarchical visual–sensor framework for monitoring the Zamioculcas houseplant in complex indoor environments and supporting adaptive care-mode selection. The proposed framework combines a two-level visual pipeline, consisting of YOLO-based target plant detection and MobileViT-S-based leaf-condition classification, with a Plant Health Index (PHI) and a rule-based decision-support module for integrating visual and IoT-derived indicators. For the detection task, YOLOv8, YOLO12, and YOLO26 were compared, with YOLO26 showing the most balanced performance among the evaluated implementations. To improve robustness in real indoor scenes, negative training samples were added; this reduced the image-level false alarm rate on an independent negative-scene test set from 50.7% to 10.0% and increased specificity from 49.3% to 90.0%. For the second visual level, MobileViT-S achieved an accuracy of 0.9857 and an F1-score of 0.9857 on the independent cropped leaf test subset. To reduce the dependence of this result on a single data split, an additional 5-fold cross-validation experiment was conducted on the full cropped leaf dataset of 847 images, resulting in an accuracy of 0.9858 ± 0.0068 and an F1-score of 0.9853 ± 0.0070. To further address plant-level generalization, an additional unseen-plant validation subset of 60 newly collected cropped leaf images was evaluated, and MobileViT-S achieved an accuracy of 0.9500 and an F1-score of 0.9499. These results support the stability of the leaf-condition classifier within the available data, although larger external validation with strict plant-level and session-level separation remains necessary. In addition, an Arduino-based module-level validation was conducted using a capacitive soil-moisture sensor to verify the proposed sensor-based and Vision–IoT decision rules. The experiment demonstrated that the rule-based layer can distinguish dry, normal, and wet soil states and select conservative care actions depending on both soil moisture and visual-condition input. A brief real-time camera–sensor communication test further confirmed that live camera input, Arduino-based soil-moisture sensing, PHI computation, and care-mode selection can be connected within one decision-support pipeline. The proposed PHI and care-mode selection module are therefore presented as a formalized decision-support layer rather than as a fully validated autonomous irrigation system. Further calibration, actuator integration, and closed-loop validation remain necessary before practical autonomous deployment. Full article

(This article belongs to the Section Internet of Things (IoT) and Industrial IoT)

► Show Figures

Figure 1

23 pages, 12407 KB

Open AccessArticle

ADS-MIR: A Machine Perception-Oriented Visible-Infrared Sensor Fusion Framework for Intelligent Transportation Perception Under Complex Illumination Conditions

by Jun Yang, Jianguo Wu, Xiaolan Zhang, Zenglong Yang, Hongfei Shen, Botao Shen and Chang Zeng

Sensors 2026, 26(12), 3675; https://doi.org/10.3390/s26123675 - 9 Jun 2026

Viewed by 279

Abstract

Multimodal sensor fusion in intelligent transportation systems faces severe challenges in maintaining reliable visual information acquisition under complex illumination conditions. Extreme low-light and intense glare significantly degrade visible-light sensor imaging quality, making it difficult for single-modal vision systems to maintain reliable target perception. [...] Read more.

Multimodal sensor fusion in intelligent transportation systems faces severe challenges in maintaining reliable visual information acquisition under complex illumination conditions. Extreme low-light and intense glare significantly degrade visible-light sensor imaging quality, making it difficult for single-modal vision systems to maintain reliable target perception. Meanwhile, although infrared sensors provide a relatively stable saliency complement for target regions, modal discrepancies and spatial misalignment between heterogeneous visible and infrared sensors often degrade fusion performance, limiting the practical benefits of multimodal sensing for machine perception. To address these issues, this study proposes Aligned, Dual-Gated, and Saliency-Guided MIRNet (ADS-MIR), a machine perception-oriented visible-infrared sensor fusion framework that enhances the discriminability and structural representation of target regions for roadside perception sensors operating under complex conditions. Specifically, the framework employs a domain alignment layer to mitigate feature distribution discrepancies and spatial misalignment between heterogeneous sensor modalities. An illumination-guided adaptive gating mechanism dynamically modulates bimodal sensor feature contributions, while a saliency-guided frequency decoupling reinforcement strategy reinforces target-related high-frequency edge details. Experimental results on the LLVIP and M3FD datasets demonstrate that ADS-MIR improves the edge information transfer factor (

Q^{A B / F}

) by 49.6% to 111.6% compared with existing methods, highlighting its distinct advantage in preserving target contours and restoring edge information. Furthermore, the enhanced results provide more discriminative input features for downstream object detection, exhibiting more stable perception capabilities under complex illumination and challenging sensing scenarios. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

23 pages, 3094 KB

Open AccessArticle

A Camera-Based Visual Sensor Pipeline for Fine-Grained Human Activity Recognition in Classroom Scenes

by Cheng Sun, Danning Wu, Zihao Wu, Weibing Zhou and Jin Zhang

Sensors 2026, 26(12), 3666; https://doi.org/10.3390/s26123666 - 8 Jun 2026

Viewed by 332

Abstract

Student behavior recognition in classroom environments is important for teaching quality assessment and intelligent education, yet it remains challenging due to dense student distributions, frequent occlusion, substantial scale variation, and the subtle nature of common classroom activities. To address these issues, this paper [...] Read more.

Student behavior recognition in classroom environments is important for teaching quality assessment and intelligent education, yet it remains challenging due to dense student distributions, frequent occlusion, substantial scale variation, and the subtle nature of common classroom activities. To address these issues, this paper proposes RepYOLOv5-SF3D, a cascaded visual perception framework for fine-grained student behavior recognition in complex classroom scenes. The framework integrates a lightweight RepYOLOv5m detector with a dual-stream SlowFast-3D recognition branch, enabling automated inference from raw video input to behavior labels. To improve robustness in dense and occluded scenes, the front-end detector serves as a spatial-prior module, while a decoupled training strategy reduces the impact of localization instability on back-end spatiotemporal learning. In addition, two task-oriented modules are introduced in the recognition branch: the Spatiotemporal Depthwise-Separable 3D module (SDS3D) and the Normalization-Based Temporal Attention Mechanism (NTAM). Experimental results on a real classroom dataset show that RepYOLOv5-SF3D achieves a mean average precision (mAP) of 88.83%, outperforming the baseline SlowFast model by 3.36% and surpassing the existing LSTC method by 2.05%, while maintaining a front-end inference latency of 12.5 ms per frame and a total model size of 151.46 MB. These results demonstrate a favorable balance between fine-grained recognition accuracy and edge-deployment efficiency in practical classroom visual sensing. Full article

(This article belongs to the Special Issue Sensors for Human Activity Recognition: 3rd Edition)

► Show Figures

Figure 1

Search Results (378)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (378)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI