MDPI - Publisher of Open Access Journals

37 pages, 8011 KB

Open AccessArticle

TopoFarm: A Topology-Annotated Panoptic Dataset for Unauthorized Farmland Excavation Scene Representation

by Shunxi Yin, Wanzeng Liu, Jun Chen, Jiaxin Ren and Jiadong Zhang

ISPRS Int. J. Geo-Inf. 2026, 15(3), 93; https://doi.org/10.3390/ijgi15030093 - 25 Feb 2026

Viewed by 35

Unauthorized farmland excavation is a prominent manifestation of farmland non-agriculturalization, and its effective monitoring depends on structured representations of objects and their spatial interactions in complex scenes. However, the existing computer vision research mainly focuses on object-level recognition or scene-level classification, while lacking [...] Read more.

Unauthorized farmland excavation is a prominent manifestation of farmland non-agriculturalization, and its effective monitoring depends on structured representations of objects and their spatial interactions in complex scenes. However, the existing computer vision research mainly focuses on object-level recognition or scene-level classification, while lacking datasets that explicitly model topological relationships in farmland excavation scenarios. To address this limitation, this paper presents TopoFarm, a topology-annotated panoptic dataset for unauthorized farmland excavation scenes. TopoFarm provides fine-grained panoptic segmentation annotations together with pairwise object contact relationship labels, enabling joint object–relation modeling and topology-aware scene representation. To improve annotation reliability under complex conditions, a human-in-the-loop hybrid intelligence framework, termed HITPA, is introduced to integrate automatic panoptic segmentation, depth-aware topological reasoning, and expert-guided refinement, achieving high annotation quality with controlled manual effort. Based on TopoFarm, systematic benchmark experiments are conducted for panoptic segmentation and topological relationship reasoning, along with a hierarchical evaluation protocol to analyze the impact of object-level representation quality on relational inference. The results demonstrate that TopoFarm poses substantial challenges for both tasks and highlight the strong dependence of topological reasoning on object accuracy and global scene context. Overall, TopoFarm provides a new data foundation and evaluation benchmark for topology-aware perception in farmland monitoring applications. Full article

(This article belongs to the Topic Geospatial AI: Systems, Model, Methods, and Applications)

► Show Figures

Figure 1

18 pages, 3135 KB

Open AccessArticle

PF-ConvNeXt: An Adverse Weather Recognition Network for Autonomous Driving Scenes

by Quanxiang Wang, Zhaofa Zhou and Zhili Zhang

Electronics 2026, 15(5), 920; https://doi.org/10.3390/electronics15050920 - 24 Feb 2026

Viewed by 63

Abstract

Rain, snow, fog, and dust can degrade road-scene images, blur fine details, and consequently reduce the reliability of perception systems for autonomous driving. To address this problem, this paper proposes PF-ConvNeXt, an adverse weather recognition model built upon the ConvNeXt architecture. First, a [...] Read more.

Rain, snow, fog, and dust can degrade road-scene images, blur fine details, and consequently reduce the reliability of perception systems for autonomous driving. To address this problem, this paper proposes PF-ConvNeXt, an adverse weather recognition model built upon the ConvNeXt architecture. First, a lightweight pyramid split attention (PSA) module is introduced to enable multi-scale feature fusion, so that both global degradation patterns and local texture details can be captured simultaneously. Second, a feature enhancement channel and spatial attention module (FECS) is designed. It adaptively recalibrates features along the channel and spatial paths, thereby suppressing interference from complex backgrounds and noise. Third, during training, Focal Loss is adopted to strengthen learning for hard samples and minority weather categories, alleviating recognition bias caused by class imbalance. Experiments are conducted on a dataset of 5000 images constructed by integrating RTTS, DAWN, and a self-collected rainy-weather dataset. The results show that PF-ConvNeXt achieves 90.16% accuracy, 95.24% mean average precision, and a 92.18% F1-score. It outperforms the ConvNeXt baseline by 4.74%, 5.46%, and 5.95%, respectively, and surpasses multiple mainstream classification models. This study provides an effective recognition framework for robust environmental perception under challenging weather conditions and demonstrates promising potential for practical deployment. Full article

(This article belongs to the Section Computer Science & Engineering)

36 pages, 692 KB

Open AccessArticle

MDGroup: Multi-Grained Dual-Aware Grouping for 3D Point Cloud Instance Segmentation

by Wenyun Sun and Ruifeng Han

Electronics 2026, 15(5), 915; https://doi.org/10.3390/electronics15050915 - 24 Feb 2026

Viewed by 68

Abstract

Instance segmentation of 3D point clouds is a fundamental task for scene understanding in applications such as autonomous driving, robotics, and augmented reality. The inherent irregularity and sparsity of point clouds, compounded by scale variations and instance adhesion, pose significant challenges to accurate [...] Read more.

Instance segmentation of 3D point clouds is a fundamental task for scene understanding in applications such as autonomous driving, robotics, and augmented reality. The inherent irregularity and sparsity of point clouds, compounded by scale variations and instance adhesion, pose significant challenges to accurate segmentation. Existing grouping-based methods are often limited by the loss of geometric details in single-path backbones and by error propagation near complex boundaries. To address these issues, a Multi-grained Dual-aware Grouping algorithm (MDGroup) is proposed, which explicitly integrates multi-grained feature representation with dual awareness of class and boundary. The algorithm features a Dual-Resolution 3D U-Net (DRNet) that preserves local geometric details while aggregating global semantics through adaptive alignment. A four-branch prediction scheme enhances semantic and offset estimation with boundary and directional cues, enabling fine-grained boundary modeling. Furthermore, a Hierarchical Adaptive Multi-grained Feature fusion framework (HAMF) achieves efficient cross-scale alignment by combining Class-Aware Dynamic Voxelization and Class-Aware Pyramid Scaling. Finally, a Boundary-Aware Weighted Aggregation mechanism (BAWA) refines instance grouping by dynamically weighting semantic confidence, geometric distance, boundary probability, and directional consistency. To extend the model to dynamic scenes, a Temporal Adaptive Gating (TAG) module is introduced to leverage historical frame correlations. Extensive experiments on the ScanNet v2, S3DIS, STPLS3D, SemanticKITTI, LiDAR-Net, and OCID benchmarks demonstrate that MDGroup achieves state-of-the-art performance among grouping-based methods, particularly on small objects, complex boundaries, and dynamic environments. Full article

(This article belongs to the Section Artificial Intelligence)

26 pages, 9548 KB

Open AccessArticle

DCM-DETR: A Lightweight Framework for Robust Infrared Small UAV Detection

by Linlin Li, Jingyao Sun and Haochen Hu

Symmetry 2026, 18(3), 397; https://doi.org/10.3390/sym18030397 - 24 Feb 2026

Viewed by 133

Abstract

Small unmanned aerial vehicle (UAV) detection in low-altitude infrared imagery remains challenging due to extremely small targets, weak contrast, scarce appearance cues, and heavy background clutter, which often leads to missed detections, clutter-induced false alarms, and localisation drift. To address these issues, we [...] Read more.

Small unmanned aerial vehicle (UAV) detection in low-altitude infrared imagery remains challenging due to extremely small targets, weak contrast, scarce appearance cues, and heavy background clutter, which often leads to missed detections, clutter-induced false alarms, and localisation drift. To address these issues, we propose Directional Context Modelling DETR (DCM-DETR), an end-to-end detector that strengthens weak target evidence via directional context modelling and scale-consistent feature aggregation. Specifically, we build a Directional Receptive-Field Enhancement (DRFE) backbone with C2f-APC units, introducing asymmetric padding to enlarge receptive fields while preserving faint target cues. We further design an Infrared-Enhanced Encoder (IEE), where a CSA-Block jointly captures directional context and local details to steer global interactions towards target-relevant regions. To suppress noise propagation and alleviate cross-scale misalignment, we employ Hierarchical Gated Fusion (HGF) and Residual Alignment (RA), enabling selective semantic modulation and consistent multi-scale alignment. Moreover, we incorporate a Magnitude-Aware Linear Attention AIFI (MALA-AIFI) module to enhance low-SNR responses with linear complexity. Experiments on SIRST-UAVB show that DCM-DETR improves

m A P_{50}

by 36.58% over YOLOv8n and by 1.09% over RT-DETR, while reducing parameters by 25.1M. On IRSTD, it yields a 2.01% gain in

m A P_{50}

and boosts speed from 47.43 FPS to 93.45 FPS. These results demonstrate that DCM-DETR achieves a strong accuracy–efficiency trade-off for infrared small UAV detection in cluttered low-altitude scenes. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

23 pages, 5350 KB

Open AccessArticle

WCDB-YOLO: Wavelet-Enhanced Contextual Dual-Backbone Network for Small Object Detection in UAV Aerial Imagery

by Di Luan, Yuna Dong, Jian Zhou, Ang Li, Ling Xie, Hongying Liu and Jun Zhu

Drones 2026, 10(3), 155; https://doi.org/10.3390/drones10030155 - 24 Feb 2026

Viewed by 160

Abstract

Object detection in UAV aerial imagery plays a pivotal role across a wide spectrum of applications. However, existing detection models continue to face significant challenges stemming from small object scales, dense spatial distributions, and highly complex backgrounds. To address these challenges, this paper [...] Read more.

Object detection in UAV aerial imagery plays a pivotal role across a wide spectrum of applications. However, existing detection models continue to face significant challenges stemming from small object scales, dense spatial distributions, and highly complex backgrounds. To address these challenges, this paper proposes a novel dual-backbone network model named WCDB-YOLO. The core innovation of this work lies in introducing a “target-context decoupled perception” paradigm, which utilizes two structurally complementary backbone networks to separately process local object features and global background information: one backbone focuses on extracting fine-grained local features of objects, while the other innovatively incorporates a wavelet convolution module to efficiently model the global contextual semantics of complex scenes with minimal computational cost by constructing a large receptive field. To further enhance the scale adaptability for small objects, a Dilation-wise Residual (DWR) module is designed, which employs parallel convolutional branches with different dilation rates to achieve dynamic adaptation to multi-scale small object features. Additionally, the model optimizes the feature pyramid structure by integrating high-resolution P2/4 features into the detection head, significantly improving the localization accuracy of tiny objects. Experimental results on the VisDrone dataset show that the proposed method achieves an 8.4% improvement in mAP50 over the baseline YOLOv11s model and outperforms current state-of-the-art (SOTA) approaches. This work presents a highly accurate and robust solution for small object detection from UAV platforms in complex environments. Full article

(This article belongs to the Topic Advances in Integrative AI, Machine Learning, and Big Data for Transformative Applications)

► Show Figures

Figure 1

15 pages, 4263 KB

Open AccessArticle

Driver Attention Prediction Based on Adaptive Fusion of Cross-Modal Features

by Mingfang Zhang, Tong Zhang, Congling Yan and Yiran Zhang

Appl. Sci. 2026, 16(4), 2150; https://doi.org/10.3390/app16042150 - 23 Feb 2026

Viewed by 151

Abstract

To investigate the dynamic changes in driver attention in complex road traffic scenarios, this paper proposes a driver attention prediction method based on cross-modal adaptive feature fusion (DAFNet). First, semantic segmentation is applied to the input image sequences, and a dual-branch encoder using [...] Read more.

To investigate the dynamic changes in driver attention in complex road traffic scenarios, this paper proposes a driver attention prediction method based on cross-modal adaptive feature fusion (DAFNet). First, semantic segmentation is applied to the input image sequences, and a dual-branch encoder using a 3D residual network is designed to extract spatio-temporal features from both RGB images and semantic information in parallel. Next, a 3D deformable attention mechanism is introduced to enhance the traditional Transformer algorithm, which focuses on the key salient regions through spatio-temporal offset prediction and adaptive fusion of cross-modal features. Subsequently, a predictive recurrent neural network is employed to forecast the fused spatio-temporal features and improve the stability of long-term sequence prediction. Finally, the driver attention results are predicted by a lightweight decoder. Experimental results demonstrate that the proposed method outperforms the comparative methods in overall performance. The predictions not only capture salient regions in driving scenes in a bottom-up manner but also track the driver’s intent in a top-down manner. Thus, our method exhibits strong adaptability to various complex traffic scenarios. Additionally, the method achieves an inference speed of 53.73 frames per second, satisfying the real-time performance requirement of on-vehicle systems. Full article

► Show Figures

Figure 1

37 pages, 34025 KB

Open AccessArticle

Individual Tree Segmentation from LiDAR Point Clouds: A Mamba-Enhanced Sparse CNN Approach for Accurate Forest Inventory

by Xiangji Peng, Jizheng Yi, Rong Liu, Xiangyu Shen and Xiaoyao Li

Remote Sens. 2026, 18(4), 664; https://doi.org/10.3390/rs18040664 - 22 Feb 2026

Viewed by 127

Abstract

Individual tree segmentation is critical for automated forest inventory systems, enabling detailed individual tree records that support precision forest management. While current airborne LiDAR systems can acquire high-density, high-accuracy point clouds of dense forests, significant challenges remain in analyzing the diversity of forest [...] Read more.

Individual tree segmentation is critical for automated forest inventory systems, enabling detailed individual tree records that support precision forest management. While current airborne LiDAR systems can acquire high-density, high-accuracy point clouds of dense forests, significant challenges remain in analyzing the diversity of forest samples across different regions. An improved method of instance segmentation using a Mamba-Enhanced Sparse Convolutional Neural Network is proposed to address the problem of misallocation caused by ambiguous boundaries and overlapping canopies of individual trees. An innovative offset prediction method further reduces the high error rate in low-canopy datasets. On the basis of a variety of features, the designed network customizes the HDBSCAN clustering algorithm and the W-KNN neighborhood search algorithm for fine-grained instance segmentation to achieve optimal performance. To address the lack of block coherence in the FOR-instance dataset and to reduce redundant noisy trees in some regions, this work develops a novel pipeline to simulate real woodland scenes and evaluates the robustness of the network in composite forests. Extensive validation on real and benchmark data demonstrates the method’s superior generalization capability, yielding robust segmentation results across varied forest structures. The most marked gains are achieved in low-canopy settings, confirming the method’s enhanced ability to handle complex structural overlaps. Our method provides a more comprehensive solution for the inventory management of structurally heterogeneous or regionally diverse woodlands, thereby enhancing both the automation and precision of forest resource assessment. Full article

(This article belongs to the Section Forest Remote Sensing)

► Show Figures

Figure 1

16 pages, 4066 KB

Open AccessArticle

A Novel ResUNet Architecture for Thin Cloud and Boundary Detection in Landsat 8 Remote Sensing Imagery

by Hao Huang, Xiaofang Liu, Chi Yang and Aimin Liu

Appl. Sci. 2026, 16(4), 2122; https://doi.org/10.3390/app16042122 - 22 Feb 2026

Viewed by 152

Abstract

To address the challenges of thin cloud detection and imprecise cloud boundary segmentation in Landsat 8 remote sensing imagery, this paper proposes a systematic approach that comprehensively enhances cloud detection accuracy from data preprocessing to network architecture optimisation. First, through empirical analysis, an [...] Read more.

To address the challenges of thin cloud detection and imprecise cloud boundary segmentation in Landsat 8 remote sensing imagery, this paper proposes a systematic approach that comprehensively enhances cloud detection accuracy from data preprocessing to network architecture optimisation. First, through empirical analysis, an optimised band input combination was determined (removing the panchromatic Band 8 and thermal infrared Band 11), effectively suppressing urban background noise. Subsequently, an enhanced ResUNet model was designed, innovatively integrating an Atrous Spatial Pyramid Pooling (ASPP) module with an attention gate (AG) mechanism. The ASPP module enhances detection capabilities for thin clouds and diffuse cloud masses by aggregating multi-scale global contextual information. The attention-gated mechanism finely tunes feature fusion during the decoding phase, suppressing interference from highly reflective surface features to achieve precise cloud boundary segmentation. Experiments conducted on the Landsat 8 dataset featuring typical urban scenes demonstrate that the proposed method significantly outperforms mainstream models across both conventional and boundary-specific metrics, achieving an overall accuracy (OA) of 0.9717, a mean intersection over union (mIoU) of 0.8102, and, notably, a mean bounding box intersection over union (mB-IoU) of 0.4154 and a mean bounding box F1 score of 0.5356, representing improvements of 16.3% and 12.5%, respectively, over existing methods. This research provides an efficient and robust technical framework for cloud detection tasks in complex urban environments, laying the foundation for high-precision processing of remote sensing imagery and subsequent quantitative analysis. Full article

► Show Figures

Figure 1

31 pages, 23527 KB

Open AccessArticle

SLC-Domain SAR RFI Suppression via Sliding-Window Local Tensorization and Energy-Guided CUR Projection

by Qiang Guo, Yuhang Tian, Shuai Huang, Liangang Qi and Sergiy Shulga

Remote Sens. 2026, 18(4), 652; https://doi.org/10.3390/rs18040652 - 20 Feb 2026

Viewed by 162

Abstract

Synthetic aperture radar (SAR) imaging is highly vulnerable to radio-frequency interference (RFI) in complex electromagnetic environments, which can introduce structured artifacts and obscure targets in single-look complex (SLC) products. Most existing suppression methods rely on separability along a single dimension or require interference-specific [...] Read more.

Synthetic aperture radar (SAR) imaging is highly vulnerable to radio-frequency interference (RFI) in complex electromagnetic environments, which can introduce structured artifacts and obscure targets in single-look complex (SLC) products. Most existing suppression methods rely on separability along a single dimension or require interference-specific parameter tuning, limiting robustness under multidimensional coupling and strong scatterers. We propose a range-domain sliding-window local tensorization that rearranges SLC data into localized range–azimuth–block-index tensors to better expose multi-mode correlations. On this representation, an energy-guided tensor CUR low-rank projector is embedded into an alternating-projection scheme that alternates complex-valued soft-thresholding for the sparse scene-plus-noise term and CUR-based projection for the structured RFI term. The cleaned SLC image is obtained by de-tensorizing the estimated RFI component and subtracting it from the input SLC. Experiments on semi-synthetic data, where controlled RFI is superimposed on real SLC scenes, and on real Sentinel-1 SLC data containing RFI demonstrate improved Pearson correlation coefficient (PCC) and perceptual image quality while preserving target signatures and scene textures, particularly under strong interference and strong coupling. The proposed approach provides a practical SLC-domain RFI mitigation tool for post-focusing SAR products without requiring explicit interference parameterization. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

23 pages, 3940 KB

Open AccessArticle

Research on Enhancing Fire Detection Performance in Ancient Architecture Under Occlusion Scenarios Based on YOLO-AR

by Chen Li, Minghan Wang, Lei Lei, Honghui Liu, Kaiyin Gao and Zuoyi Wang

Sensors 2026, 26(4), 1357; https://doi.org/10.3390/s26041357 - 20 Feb 2026

Viewed by 230

Abstract

Fire detection in ancient architecture presents significant challenges due to complex scenes and unique structural characteristics. Traditional detection methods often demonstrate limitations when addressing the specific structural idiosyncrasies of individual ancient buildings and the overlapping occlusion prevalent in architectural complexes. This paper proposes [...] Read more.

Fire detection in ancient architecture presents significant challenges due to complex scenes and unique structural characteristics. Traditional detection methods often demonstrate limitations when addressing the specific structural idiosyncrasies of individual ancient buildings and the overlapping occlusion prevalent in architectural complexes. This paper proposes YOLO-AR, a novel fire detection algorithm based on an improved YOLOv8 framework. By embedding the Convolutional Block Attention Module (CBAM) at the end of the backbone network, the algorithm enhances its capability to capture key features of flames and smoke. Furthermore, the Repulsion Loss function is introduced to explicitly optimize bounding box localization accuracy in occluded and dense scenarios. Experiments conducted on a self-constructed ancient architecture dataset comprising 15,847 images demonstrate that YOLO-AR outperforms mainstream comparative algorithms in terms of Precision, Recall, and mean Average Precision (mAP). Specifically, the detection precision reached 90.7%, and the recall rate improved to 89.7%. This study provides an efficient and reliable visual detection solution for early warning systems in ancient architecture, offering significant value for cultural heritage preservation. Full article

(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)

► Show Figures

Figure 1

26 pages, 5823 KB

Open AccessArticle

A Topographic Shadow Effect Correction (TSEC) Method for Correcting Surface Reflectance of Optical Remote Sensing Images in Rugged Terrain

by Xu Yang, Wenbin Xie, Xiaoqing Zuo, Shipeng Guo, Daming Zhu, Yongfa Li, Jiangqi Li and Yan Luo

Remote Sens. 2026, 18(4), 642; https://doi.org/10.3390/rs18040642 - 19 Feb 2026

Viewed by 241

Abstract

The topographic shadow effect can cause surface reflectance distortions in the shadow areas of remote sensing images, particularly in complex mountainous areas. In this study, based on the difference in solar radiation received at the surface of sunlit and shadow areas, we introduced [...] Read more.

The topographic shadow effect can cause surface reflectance distortions in the shadow areas of remote sensing images, particularly in complex mountainous areas. In this study, based on the difference in solar radiation received at the surface of sunlit and shadow areas, we introduced the shadow intensity, vegetation index, and band adjustment factors, and proposed a topographic shadow effect correction (TSEC) method. The method was then tested using eight Landsat 8 OLI scenes under different illumination conditions from two different regions. The results indicate that TSEC effectively corrected the topographic shadow effect. The corrected images exhibited good visual quality without obvious shadow pixels. Importantly, TSEC retained spectral information in sunlit areas while correcting spectral distortion in shadow areas, resulting in strong agreement between spectral curves of shady and sunny slopes. The method demonstrated high stability in normalized difference vegetation index (NDVI) correction, as the difference in NDVI before and after correction was less than 0.07 for the four scenes within the Changjiang study area. Moreover, the TSEC corrected the enhanced vegetation index (EVI) effectively, reducing an initial EVI difference of over 0.35 between the shady and sunny slopes to a maximum of 0.074 for the four scenes within the Wuyi Mountain study area. Relative to four established topographic correction models, the proposed method suppresses the over-correction phenomena typical of self-shadows and minimizes under-correction in cast shadows, resulting in stable overall correction results with few outliers. The TSEC provides a simple and effective method to correct the distorted reflectance in shadow areas using only image and DEM data, which can be adapted to complex mountainous areas and for images with different illumination conditions. Full article

(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)

► Show Figures

Figure 1

24 pages, 5692 KB

Open AccessArticle

Multi-Scenario Recognition and Detection Model in National Parks Based on Improved YOLOv8

by Xiongwei Lou, Zixuan Qin, Hanbao Lou, Xinyu Zheng, Linhao Sun, Faneng Wang, Dasheng Wu, Sheng Chen and Guangyu Jiang

Forests 2026, 17(2), 277; https://doi.org/10.3390/f17020277 - 19 Feb 2026

Viewed by 168

Abstract

With the advancement of unmanned aerial vehicle (UAV) technology, its use in ecological monitoring and safety management of national parks has expanded significantly. However, object detection in complex scenes remains challenging due to environmental complexity, background interference, and occlusion. To address these issues, [...] Read more.

With the advancement of unmanned aerial vehicle (UAV) technology, its use in ecological monitoring and safety management of national parks has expanded significantly. However, object detection in complex scenes remains challenging due to environmental complexity, background interference, and occlusion. To address these issues, this paper proposes two improved YOLOv8-based models, YOLOv8-StarNet-CGA and SCS-YOLOv8, for detecting pine wilt disease-infected trees, under-construction farmhouses, and forest fires. In YOLOv8-StarNet-CGA, the StarNet module and Content-Guided Attention (CGA) are integrated into the backbone to enhance global feature extraction and focus on critical regions through dynamic weight adjustment. In SCS-YOLOv8, the original CIoU loss is also replaced with SIoU loss to optimize shape and orientation consistency, improving robustness. Experiments on UAV datasets covering diverse national park scenes demonstrate the effectiveness of the models. Results show that the improved models substantially outperform the original YOLOv8 in Precision, Recall, and mAP50. For pine wilt disease caused by the pine wood nematode Bursaphelenchus xylophilus, YOLOv8-StarNet-CGA achieves 8.6% higher Precision and 11.7% higher mAP50, facilitating early diagnosis and intervention of the disease. In under-construction farmhouse scenarios, Precision rises by 11% and mAP50 by 10.1%, lowering annual inspection labor by nearly 30% and improving oversight. For forest fires, SCS-YOLOv8 is more effective, with Precision improved by 7.2% and mAP50 by 6.3%. The improved detection model enables earlier identification of fire spots, thereby providing additional response time for emergency intervention, helping to mitigate fire spread and reduce the loss of forest resources. Both models also reduce GFLOPs and computational complexity, striking a balance between efficiency and accuracy, and showing strong potential for UAV deployment. Full article

(This article belongs to the Section Natural Hazards and Risk Management)

► Show Figures

Figure 1

22 pages, 4286 KB

Open AccessArticle

Symmetry-Enhanced Indoor Occupant Locating and Motionless Alarm System: Fusion of BP Neural Network and DS-TWR Technology

by Li Wang, Zhe Wang, Xinhe Meng, Wentao Chen and Aijun Sun

Symmetry 2026, 18(2), 376; https://doi.org/10.3390/sym18020376 - 18 Feb 2026

Viewed by 208

Abstract

To address the critical demand for real-time dynamic tracking of personnel in complex buildings during emergency rescue, a novel system was proposed integrating Back Propagation (BP) neural networks with Double-Sided Two-Way Ranging (DS-TWR) technology to achieve precise indoor localization and motionless detection. Comprising [...] Read more.

To address the critical demand for real-time dynamic tracking of personnel in complex buildings during emergency rescue, a novel system was proposed integrating Back Propagation (BP) neural networks with Double-Sided Two-Way Ranging (DS-TWR) technology to achieve precise indoor localization and motionless detection. Comprising hardware (positioning base stations, tags, POE switches, routers, and a computer) and software (developed on LabVIEW), the system leverages the symmetric signal transmission of DS-TWR and the adaptive learning capability of BP neural networks to effectively mitigate multipath interference, enhancing positioning consistency and accuracy. Thresholds of time period and movement distance were set to determine whether the occupant was trapped. When tested in several common building structures, it demonstrated good stability and high accuracy—the average RMSE of the positioning system was within 0.012–0.018 m (static state) and 0.048–0.065 m (dynamic state). Furthermore, the system could real-time monitor and display the movement trajectory of each person, and automatically alarm when anyone was trapped in a fire scene. Hence, rescue measures can be taken timely according to the alarm information provided by the system, effectively ensuring the safety of personnel and improving the efficiency of fire rescue work. The proposed approach provides a symmetry-driven framework for intelligent building safety. Full article

► Show Figures

Figure 1

36 pages, 4079 KB

Open AccessArticle

FEGW-YOLO: A Feature-Complexity-Guided Lightweight Framework for Real-Time Multi-Crop Detection with Advanced Sensing Integration on Edge Devices

by Yaojiang Liu, Hongjun Tian, Yijie Yin, Yuhan Zhou, Wei Li, Yang Xiong, Yichen Wang, Zinan Nie, Yang Yang, Dongxiao Xie and Shijie Huang

Sensors 2026, 26(4), 1313; https://doi.org/10.3390/s26041313 - 18 Feb 2026

Viewed by 147

Abstract

Real-time object detection on resource-constrained edge devices remains a critical challenge in precision agriculture and autonomous systems, particularly when integrating advanced multi-modal sensors (RGB-D, thermal, hyperspectral). This paper introduces FEGW-YOLO, a lightweight detection framework explicitly designed to bridge the efficiency-accuracy gap for fine-grained [...] Read more.

Real-time object detection on resource-constrained edge devices remains a critical challenge in precision agriculture and autonomous systems, particularly when integrating advanced multi-modal sensors (RGB-D, thermal, hyperspectral). This paper introduces FEGW-YOLO, a lightweight detection framework explicitly designed to bridge the efficiency-accuracy gap for fine-grained visual perception on edge hardware while maintaining compatibility with multiple sensor modalities. The core innovation is a Feature Complexity Descriptor (FCD) metric that enables adaptive, layer-wise compression based on the information-bearing capacity of network features. This compression-guided approach is coupled with (1) Feature Engineering-driven Ghost Convolution (FEG-Conv) for parameter reduction, (2) Efficient Multi-Scale Attention (EMA) for compensating compression-induced information loss, and (3) Wise-IoU loss for improved localization in dense, occluded scenes. The framework follows a principled “Compress, Compensate, and Refine” philosophy that treats compression and compensation as co-designed objectives rather than isolated knobs. Extensive experiments on a custom strawberry dataset (11,752 annotated instances) and cross-crop validation on apples, tomatoes, and grapes demonstrate that FEGW-YOLO achieves 95.1% mAP@0.5 while reducing model parameters by 54.7% and computational cost (GFLOPs) by 53.5% compared to a strong YOLO-Agri baseline. Real-time inference on NVIDIA Jetson Xavier achieves 38 FPS at 12.3 W, enabling 40+ hours of continuous operation on typical agricultural robotic platforms. Multi-modal fusion experiments with RGB-D sensors demonstrate that the lightweight architecture leaves sufficient computational headroom for parallel processing of depth and visual data, a capability essential for practical advanced sensing systems. Field deployment in commercial strawberry greenhouses validates an 87.3% harvesting success rate with a 2.1% fruit damage rate, demonstrating feasibility for autonomous systems. The proposed framework advances the state-of-the-art in efficient agricultural sensing by introducing a principled metric-guided compression strategy, comprehensive multi-modal sensor integration, and empirical validation across diverse crop types and real-world deployment scenarios. This work bridges the gap between laboratory research and practical edge deployment of advanced sensing systems, with direct relevance to autonomous harvesting, precision monitoring, and other resource-constrained agricultural applications. Full article

(This article belongs to the Special Issue Real-Time Object Detection and Classification Using Advanced Sensing Techniques)

► Show Figures

Figure 1

25 pages, 1558 KB

Open AccessArticle

Towards Scalable Monitoring: An Interpretable Multimodal Framework for Migration Content Detection on TikTok Under Data Scarcity

by Dimitrios Taranis, Gerasimos Razis and Ioannis Anagnostopoulos

Electronics 2026, 15(4), 850; https://doi.org/10.3390/electronics15040850 - 17 Feb 2026

Viewed by 236

Abstract

Short-form video platforms such as TikTok (TikTok Pte. Ltd., Singapore) host large volumes of user-generated, often ephemeral, content related to irregular migration, where relevant cues are distributed across visual scenes, on-screen text, and multilingual captions. Automatically identifying migration-related videos is challenging due to [...] Read more.

Short-form video platforms such as TikTok (TikTok Pte. Ltd., Singapore) host large volumes of user-generated, often ephemeral, content related to irregular migration, where relevant cues are distributed across visual scenes, on-screen text, and multilingual captions. Automatically identifying migration-related videos is challenging due to this multimodal complexity and the scarcity of labeled data in sensitive domains. This paper presents an interpretable multimodal classification framework designed for deployment under data-scarce conditions. We extract features from platform metadata, automated video analysis (Google Cloud Video Intelligence), and Optical Character Recognition (OCR) text, and compare text-only, OCR-only, and vision-only baselines against a multimodal fusion approach using Logistic Regression, Random Forest, and XGBoost. In this pilot study, multimodal fusion consistently improves class separation over single-modality models, achieving an F1-score of 0.92 for the migration-related class under stratified cross-validation. Given the limited sample size, these results are interpreted as evidence of feature separability rather than definitive generalization. Feature importance and SHAP analyses identify OCR-derived keywords, maritime cues, and regional indicators as the most influential predictors. To assess robustness under data scarcity, we apply SMOTE to synthetically expand the training set to 500 samples and evaluate performance on a small held-out set of real videos, observing stable results that further support feature-level robustness. Finally, we demonstrate scalability by constructing a weakly labeled corpus of 600 videos using the identified multimodal cues, highlighting the suitability of the proposed feature set for weakly supervised monitoring at scale. Overall, this work serves as a methodological blueprint for building interpretable multimodal monitoring pipelines in sensitive, low-resource settings. Full article

(This article belongs to the Special Issue Multimodal Learning for Multimedia Content Analysis and Understanding)

► Show Figures

Figure 1

Search Results (2,607)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (2,607)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI