Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (25)

Search Parameters:
Keywords = hierarchical visual–sensor framework

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
34 pages, 1396 KB  
Article
From Detection Toward Decision Support: A Hierarchical Visual–Sensor Framework for Zamioculcas Monitoring in Indoor Environments
by Raikhan Amanova, Baurzhan Belgibayev, Yersaiyn Mailybayev, Gulnur Kazbekova, Zhadyra Akanova, Galiya Mamankyzy, Marzhana Amanova, Artem Bykov, Periuza Pirniyazova and Nurzhigit Smailov
Computers 2026, 15(6), 382; https://doi.org/10.3390/computers15060382 - 11 Jun 2026
Viewed by 85
Abstract
This paper proposes a prototype-level hierarchical visual–sensor framework for monitoring the Zamioculcas houseplant in complex indoor environments and supporting adaptive care-mode selection. The proposed framework combines a two-level visual pipeline, consisting of YOLO-based target plant detection and MobileViT-S-based leaf-condition classification, with a Plant [...] Read more.
This paper proposes a prototype-level hierarchical visual–sensor framework for monitoring the Zamioculcas houseplant in complex indoor environments and supporting adaptive care-mode selection. The proposed framework combines a two-level visual pipeline, consisting of YOLO-based target plant detection and MobileViT-S-based leaf-condition classification, with a Plant Health Index (PHI) and a rule-based decision-support module for integrating visual and IoT-derived indicators. For the detection task, YOLOv8, YOLO12, and YOLO26 were compared, with YOLO26 showing the most balanced performance among the evaluated implementations. To improve robustness in real indoor scenes, negative training samples were added; this reduced the image-level false alarm rate on an independent negative-scene test set from 50.7% to 10.0% and increased specificity from 49.3% to 90.0%. For the second visual level, MobileViT-S achieved an accuracy of 0.9857 and an F1-score of 0.9857 on the independent cropped leaf test subset. To reduce the dependence of this result on a single data split, an additional 5-fold cross-validation experiment was conducted on the full cropped leaf dataset of 847 images, resulting in an accuracy of 0.9858 ± 0.0068 and an F1-score of 0.9853 ± 0.0070. To further address plant-level generalization, an additional unseen-plant validation subset of 60 newly collected cropped leaf images was evaluated, and MobileViT-S achieved an accuracy of 0.9500 and an F1-score of 0.9499. These results support the stability of the leaf-condition classifier within the available data, although larger external validation with strict plant-level and session-level separation remains necessary. In addition, an Arduino-based module-level validation was conducted using a capacitive soil-moisture sensor to verify the proposed sensor-based and Vision–IoT decision rules. The experiment demonstrated that the rule-based layer can distinguish dry, normal, and wet soil states and select conservative care actions depending on both soil moisture and visual-condition input. A brief real-time camera–sensor communication test further confirmed that live camera input, Arduino-based soil-moisture sensing, PHI computation, and care-mode selection can be connected within one decision-support pipeline. The proposed PHI and care-mode selection module are therefore presented as a formalized decision-support layer rather than as a fully validated autonomous irrigation system. Further calibration, actuator integration, and closed-loop validation remain necessary before practical autonomous deployment. Full article
(This article belongs to the Section Internet of Things (IoT) and Industrial IoT)
21 pages, 906 KB  
Article
Hierarchical Semantic Transmission and Lyapunov-Optimized Online Scheduling for the Internet of Vehicles
by Le Jiang, Yani Guo, Wenzhao Zhang, Penghao Wang and Shujun Han
Sensors 2026, 26(9), 2606; https://doi.org/10.3390/s26092606 - 23 Apr 2026
Viewed by 318
Abstract
The inherent redundancy in vehicle sensor data, coupled with constrained onboard resources and stringent latency requirements, renders traditional bit-oriented transmission paradigms inefficient for autonomous-driving perception tasks. Semantic communication offers a promising direction by shifting the focus from bit-level fidelity to task-level information delivery. [...] Read more.
The inherent redundancy in vehicle sensor data, coupled with constrained onboard resources and stringent latency requirements, renders traditional bit-oriented transmission paradigms inefficient for autonomous-driving perception tasks. Semantic communication offers a promising direction by shifting the focus from bit-level fidelity to task-level information delivery. In this paper, we propose a unified framework that integrates hierarchical transmission and online scheduling for Internet of Vehicles (IoV)-oriented collaborative perception. The proposed hierarchy separates information into two complementary layers: a coarse metadata layer (object bounding boxes) for latency-critical awareness, and fine-grained visual semantics (multi-scale region-of-interest (ROI) patches) for perception-intensive tasks. We formulate an online scheduling problem that jointly exploits Age of Information (AoI) and Channel State Information (CSI) to dynamically decide what to transmit and at what fidelity under per-frame budget constraints. To address cross-scheme fairness, we report resource utilization under a fixed kbps/fps physical budget and evaluate robustness using a combination of a lightweight task-proxy metric and COCO-style Average Recall (AR100) under ROI-only evaluation. The hierarchical transmission architecture, combined with AoI awareness, reduces global semantic staleness by approximately 78%. The Lyapunov-based online scheduler enables intelligent, signal-to-noise ratio (SNR)-adaptive switching between coarse and fine semantic levels, ensuring robust perception under varying channel quality. Under strict physical-budget constraints and unreliable channel conditions, joint source-channel coding (JSCC) exhibits significantly stronger task robustness than conventional schemes: at 0 dB SNR, the task-proxy detection rate improves by nearly 47 percentage points over the uncoded baseline. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

29 pages, 7372 KB  
Article
Multi-Scale Frequency-Aware Representation Learning for Infrared and Visible Image Fusion
by Chuanwen Hu, Zheyi Hu, Chuan Xu, Zhina Song and Liye Mei
Remote Sens. 2026, 18(8), 1178; https://doi.org/10.3390/rs18081178 - 15 Apr 2026
Viewed by 513
Abstract
Infrared and visible image fusion aims to integrate complementary information from heterogeneous sensors for remote sensing and Earth-observation applications. To achieve a better balance between global contextual modeling and local structural preservation, we propose MSF-Net, a multi-scale frequency-aware fusion network with a hierarchical [...] Read more.
Infrared and visible image fusion aims to integrate complementary information from heterogeneous sensors for remote sensing and Earth-observation applications. To achieve a better balance between global contextual modeling and local structural preservation, we propose MSF-Net, a multi-scale frequency-aware fusion network with a hierarchical design. The proposed framework consists of two main stages: multi-scale feature extraction with frequency-domain interaction and hierarchical cross-modal fusion. Specifically, a hybrid spatial-frequency encoding block (HSFEB) is designed as the basic building unit, which combines a spatial-frequency interaction module (SFIM) for global context aggregation in the frequency domain and a structure-guided feature refinement module (SGFRM) for preserving local structural details. In addition, a hierarchical feature fusion module (HFFM) is introduced to progressively integrate cross-modal and cross-scale features in a coarse-to-fine manner. A joint loss function, composed of intensity and structural constraints, is adopted to supervise the fusion process. Extensive experiments on three public benchmarks, MSRS, M3FD, and TNO, demonstrate that MSF-Net achieves superior performance over nine SOTA methods in both qualitative and quantitative evaluations. The results show that the proposed method effectively enhances thermal targets, preserves structural details, and maintains good visual naturalness under diverse remote-sensing scenarios. Full article
Show Figures

Figure 1

17 pages, 1639 KB  
Article
Cascade Registration and Fusion for Unaligned Infrared and Visible Images in Autonomous Driving
by Long Xiao, Yidong Xie and Chengda Yao
Electronics 2026, 15(7), 1427; https://doi.org/10.3390/electronics15071427 - 30 Mar 2026
Viewed by 440
Abstract
Infrared and visible image fusion is a critical technology for enhancing the all-weather perception capabilities of autonomous driving systems. However, the inherent physical parallax of vehicle-mounted sensors combined with motion-induced vibrations makes it difficult to achieve strict alignment between the source images. Direct [...] Read more.
Infrared and visible image fusion is a critical technology for enhancing the all-weather perception capabilities of autonomous driving systems. However, the inherent physical parallax of vehicle-mounted sensors combined with motion-induced vibrations makes it difficult to achieve strict alignment between the source images. Direct fusion of such misaligned pairs leads to ghosting artifacts, which significantly compromises driving safety. To address this challenge, this paper proposes a cascaded deep fusion framework tailored for autonomous driving scenarios. A dual-modal perception dataset is first constructed, incorporating realistic physical parallax and non-rigid deformations. Subsequently, a decoupled strategy is established, characterized by geometric correction followed by semantic fusion: the Static-Feature Recursive Registration (SFRR) network is utilized to explicitly correct the spatial misalignments caused by parallax, thereby establishing geometric consistency; then, the Hierarchical Invertible Block Fusion (HIBF) network achieves lossless integration of cross-modal features by combining spatial frequency separation with invertible interaction techniques. Experimental results demonstrate that the proposed method outperforms representative algorithms across several metrics, including Mutual Information (MI), Visual Information Fidelity (VIF), Structural Similarity (SSIM), and Correlation Coefficient (CC), producing high-quality fused images with clear structural definitions. Full article
Show Figures

Figure 1

30 pages, 135773 KB  
Article
Robust 3D Multi-Object Tracking via 4D mmWave Radar-Camera Fusion and Disparity-Domain Depth Recovery
by Yunfei Xie, Xiaohui Li, Dingheng Wang, Zhuo Wang, Shiliang Li, Jia Wang and Zhenping Sun
Sensors 2026, 26(7), 2096; https://doi.org/10.3390/s26072096 - 27 Mar 2026
Viewed by 917
Abstract
4D millimeter-wave radar provides high-precision ranging capability and exhibits strong robustness under adverse weather and low-visibility conditions, but its point clouds are relatively sparse and suffer from severe elevation-angle measurement noise. Monocular cameras, by contrast, provide rich semantic information and high recall, yet [...] Read more.
4D millimeter-wave radar provides high-precision ranging capability and exhibits strong robustness under adverse weather and low-visibility conditions, but its point clouds are relatively sparse and suffer from severe elevation-angle measurement noise. Monocular cameras, by contrast, provide rich semantic information and high recall, yet are fundamentally limited by scale ambiguity. To exploit the complementary characteristics of these two sensors, this paper proposes a radar-camera fusion 3D multi-object tracking framework that does not rely on complex 3D annotated data. First, on the radar signal-processing side, a Gaussian distribution-based adaptive angle compression method and IMU-based velocity compensation are introduced to effectively suppress measurement noise, and an improved DBSCAN clustering scheme with recursive cluster splitting and historical static-box guidance is employed to generate high-quality radar detections. Second, a disparity-domain metric depth recovery method is proposed. This method uses filtered radar points as sparse metric anchors, performs robust fitting with RANSAC, and applies Kalman filtering for temporal smoothing, thereby converting the relative depth output of the visual foundation model Depth Anything V2 into metric depth. Finally, a hierarchical fusion strategy is designed at both the detection and tracking levels to achieve stable cross-modal state association. Experimental results on a self-collected dataset show that the proposed method achieves an overall MOTA of 77.93%, outperforming single-modality baselines and other comparison methods by 11 to 31 percentage points. This study provides an effective solution for low-cost and robust environment perception in complex dynamic scenarios. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

27 pages, 5957 KB  
Article
A Study of the Three-Dimensional Localization of an Underwater Glider Hull Using a Hierarchical Convolutional Neural Network Vision Encoder and a Variable Mixture-of-Experts Transformer
by Jungwoo Lee, Ji-Hyun Park, Jeong-Hwan Hwang, Kyoungseok Noh and Jinho Suh
Remote Sens. 2026, 18(5), 793; https://doi.org/10.3390/rs18050793 - 5 Mar 2026
Viewed by 424
Abstract
Although underwater gliders are highly energy-efficient platforms capable of long-duration and large-scale ocean observation, their lack of self-propulsion requires external assistance for recovery upon mission completion. In harsh and dynamic marine environments, reliably detecting the glider and accurately estimating its three-dimensional position are [...] Read more.
Although underwater gliders are highly energy-efficient platforms capable of long-duration and large-scale ocean observation, their lack of self-propulsion requires external assistance for recovery upon mission completion. In harsh and dynamic marine environments, reliably detecting the glider and accurately estimating its three-dimensional position are critical to ensuring the recovery operations are safe and efficient. This paper proposes a perception framework based on deep learning to detect underwater glider hulls and estimate their three-dimensional relative positions using camera–sonar multi-sensor fusion. This approach integrates a hierarchical convolutional neural network (CNN) vision encoder and a transformer-based architecture to estimate the glider’s spatial location and heading direction simultaneously. The hierarchical CNN encoder extracts multi-level, semantically rich visual features, thereby improving robustness to visual degradation and environmental disturbances common in underwater settings. Additionally, the transformer incorporates a variable mixture-of-experts (vMoE) mechanism that adaptively allocates expert networks across layers, enhancing representational capacity while maintaining computational efficiency. The resulting pose estimates enable precise, collision-free ROV navigation for automated recovery and onboard sensor inspection tasks. Experimental results, including ablation studies, validate the effectiveness of the proposed components and demonstrate their contributions to accurate glider hull detection and three-dimensional localization. Overall, the proposed framework provides a scalable, reliable perception solution that allows for the safe, autonomous recovery of underwater gliders with an ROV in realistic ocean environments. Full article
Show Figures

Figure 1

22 pages, 1546 KB  
Article
Multimodal Fusion Attention Network for Real-Time Obstacle Detection and Avoidance for Low-Altitude Aircraft
by Xiaoqi Xu and Yiyang Zhao
Symmetry 2026, 18(2), 384; https://doi.org/10.3390/sym18020384 - 22 Feb 2026
Viewed by 649
Abstract
The rapid expansion of low-altitude unmanned aerial vehicles demands robust obstacle detection and avoidance systems capable of operating under diverse environmental conditions. This paper proposes a multimodal fusion attention network that integrates visual imagery and Light Detection and Ranging (LiDAR) point cloud data [...] Read more.
The rapid expansion of low-altitude unmanned aerial vehicles demands robust obstacle detection and avoidance systems capable of operating under diverse environmental conditions. This paper proposes a multimodal fusion attention network that integrates visual imagery and Light Detection and Ranging (LiDAR) point cloud data for real-time obstacle perception. The architecture incorporates a bidirectional cross-modal attention mechanism that learns dynamic correspondences between heterogeneous sensor modalities, enabling adaptive feature integration based on contextual reliability. An adaptive weighting component automatically modulates modal contributions according to estimated sensor confidence under varying environmental conditions. The network further employs gated fusion units and multi-scale feature pyramids to ensure comprehensive obstacle representation across different distances. A hierarchical avoidance decision framework translates detection outputs into executable control commands through threat assessment and graduated response strategies. Experimental evaluation on both public benchmarks and a purpose-collected low-altitude obstacle dataset demonstrates that the proposed method achieves 84.9% mean Average Precision (mAP) while maintaining 47.3 frames per second (FPS) on Graphics Processing Unit (GPU) hardware and 23.6 FPS on embedded platforms. Ablation studies confirm the contribution of each architectural component, with cross-modal attention providing the most substantial performance improvement. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

25 pages, 15438 KB  
Article
Day–Night All-Sky Scene Classification with an Attention-Enhanced EfficientNet
by Wuttichai Boonpook, Peerapong Torteeka, Kritanai Torsri, Daroonwan Kamthonkiat, Yumin Tan, Asamaporn Sitthi, Patcharin Kamsing, Chomchanok Arunplod, Utane Sawangwit, Thanachot Ngamcharoensuktavorn and Kijnaphat Suksod
ISPRS Int. J. Geo-Inf. 2026, 15(2), 66; https://doi.org/10.3390/ijgi15020066 - 3 Feb 2026
Viewed by 1597
Abstract
All-sky cameras provide continuous hemispherical observations essential for atmospheric monitoring and observatory operations; however, automated classification of sky conditions in tropical environments remains challenging due to strong illumination variability, atmospheric scattering, and overlapping thin-cloud structures. This study proposes EfficientNet-Attention-SPP Multi-scale Network (EASMNet), a [...] Read more.
All-sky cameras provide continuous hemispherical observations essential for atmospheric monitoring and observatory operations; however, automated classification of sky conditions in tropical environments remains challenging due to strong illumination variability, atmospheric scattering, and overlapping thin-cloud structures. This study proposes EfficientNet-Attention-SPP Multi-scale Network (EASMNet), a physics-aware deep learning framework for robust all-sky scene classification using hemispherical imagery acquired at the Thai National Observatory. The proposed architecture integrates Squeeze-and-Excitation (SE) blocks for radiometric channel stabilization, the Convolutional Block Attention Module (CBAM) for spatial–semantic refinement, and Spatial Pyramid Pooling (SPP) for hemispherical multi-scale context aggregation within a fully fine-tuned EfficientNetB7 backbone, forming a domain-aware atmospheric representation framework. A large-scale dataset comprising 122,660 RGB images across 13 day–night sky-scene categories was curated, capturing diverse tropical atmospheric conditions including humidity, haze, illumination transitions, and sensor noise. Extensive experimental evaluations demonstrate that the EASMNet achieves 93% overall accuracy, outperforming representative convolutional (VGG16, ResNet50, DenseNet121) and transformer-based architectures (Swin Transformer, Vision Transformer). Ablation analyses confirm the complementary contributions of hierarchical attention and multi-scale aggregation, while class-wise evaluation yields F1-scores exceeding 0.95 for visually distinctive categories such as Day Humid, Night Clear Sky, and Night Noise. Residual errors are primarily confined to physically transitional and low-contrast atmospheric regimes. These results validate the EASMNet as a reliable, interpretable, and computationally feasible framework for real-time observatory dome automation, astronomical scheduling, and continuous atmospheric monitoring, and provide a scalable foundation for autonomous sky-observation systems deployable across diverse climatic regions. Full article
Show Figures

Figure 1

24 pages, 3924 KB  
Article
Global-Local-Structure Collaborative Approach for Cross-Domain Reference-Based Image Super-Resolution
by Xiuxia Cai, Chenyang Diwu, Ting Fan, Wenjing Wang and Jinglu He
Remote Sens. 2026, 18(3), 487; https://doi.org/10.3390/rs18030487 - 3 Feb 2026
Viewed by 607
Abstract
Remote sensing image super-resolution (RSISR) aims to reconstruct high-resolution images from low-resolution observations of remote sensing data to enhance the visual quality and usability of remote sensors. Real world RSISR is challenging owing to the diverse degradations like blur, noise, compression, and atmospheric [...] Read more.
Remote sensing image super-resolution (RSISR) aims to reconstruct high-resolution images from low-resolution observations of remote sensing data to enhance the visual quality and usability of remote sensors. Real world RSISR is challenging owing to the diverse degradations like blur, noise, compression, and atmospheric distortions. We propose hierarchical multi-task super- resolution framework including degradation-aware modeling, dual-decoder reconstruction, and static regularization-guided generation. Speciffcally, the degradation-wise module adaptively characterizes multiple types of degradation and provides effective conditional priors for reconstruction. The dual-decoder platform incorporates both convolutional and Transformer branches to match local detail preservation as well as global structural consistency. Moreover, the static regularizing guided generation introduces prior constraints such as total variation and gradient consistency to improve robustness to varying degradation levels. Extensive experiments on two public remote sensing datasets show that our method achieves performance that is robust against varying degradation conditions. Full article
(This article belongs to the Special Issue Multimodal AI-Empowered Remote Sensing: Image Fusion and Analysis)
Show Figures

Figure 1

23 pages, 3420 KB  
Article
Design of a Wireless Monitoring System for Cooling Efficiency of Grid-Forming SVG
by Liqian Liao, Jiayi Ding, Guangyu Tang, Yuanwei Zhou, Jie Zhang, Hongxin Zhong, Ping Wang, Bo Yin and Liangbo Xie
Electronics 2026, 15(3), 520; https://doi.org/10.3390/electronics15030520 - 26 Jan 2026
Viewed by 509
Abstract
The grid-forming static var generator (SVG) is a key device that supports the stable operation of power grids with a high penetration of renewable energy. The cooling efficiency of its forced water-cooling system directly determines the reliability of the entire unit. However, existing [...] Read more.
The grid-forming static var generator (SVG) is a key device that supports the stable operation of power grids with a high penetration of renewable energy. The cooling efficiency of its forced water-cooling system directly determines the reliability of the entire unit. However, existing wired monitoring methods suffer from complex cabling and limited capacity to provide a full perception of the water-cooling condition. To address these limitations, this study develops a wireless monitoring system based on multi-source information fusion for real-time evaluation of cooling efficiency and early fault warning. A heterogeneous wireless sensor network was designed and implemented by deploying liquid-level, vibration, sound, and infrared sensors at critical locations of the SVG water-cooling system. These nodes work collaboratively to collect multi-physical field data—thermal, acoustic, vibrational, and visual information—in an integrated manner. The system adopts a hybrid Wireless Fidelity/Bluetooth (Wi-Fi/Bluetooth) networking scheme with electromagnetic interference-resistant design to ensure reliable data transmission in the complex environment of converter valve halls. To achieve precise and robust diagnosis, a three-layer hierarchical weighted fusion framework was established, consisting of individual sensor feature extraction and preliminary analysis, feature-level weighted fusion, and final fault classification. Experimental validation indicates that the proposed system achieves highly reliable data transmission with a packet loss rate below 1.5%. Compared with single-sensor monitoring, the multi-source fusion approach improves the diagnostic accuracy for pump bearing wear, pipeline micro-leakage, and radiator blockage to 98.2% and effectively distinguishes fault causes and degradation tendencies of cooling efficiency. Overall, the developed wireless monitoring system overcomes the limitations of traditional wired approaches and, by leveraging multi-source fusion technology, enables a comprehensive assessment of cooling efficiency and intelligent fault diagnosis. This advancement significantly enhances the precision and reliability of SVG operation and maintenance, providing an effective solution to ensure the safe and stable operation of both grid-forming SVG units and the broader power grid. Full article
(This article belongs to the Section Industrial Electronics)
Show Figures

Figure 1

20 pages, 1597 KB  
Article
Three-Level MIFT: A Novel Multi-Source Information Fusion Waterway Tracking Framework
by Wanqing Liang, Chen Qiu, Mei Wang and Ruixiang Kan
Electronics 2025, 14(21), 4344; https://doi.org/10.3390/electronics14214344 - 5 Nov 2025
Viewed by 817
Abstract
To address the limitations of single-sensor perception in inland vessel monitoring and the lack of robustness of traditional tracking methods in occlusion and maneuvering scenarios, this paper proposes a hierarchical multi-target tracking framework that fuses Light Detection and Ranging (LiDAR) data with Automatic [...] Read more.
To address the limitations of single-sensor perception in inland vessel monitoring and the lack of robustness of traditional tracking methods in occlusion and maneuvering scenarios, this paper proposes a hierarchical multi-target tracking framework that fuses Light Detection and Ranging (LiDAR) data with Automatic Identification System (AIS) information. First, an improved adaptive LiDAR tracking algorithm is introduced: stable trajectory tracking and state estimation are achieved through hybrid cost association and an Adaptive Kalman Filter (AKF). Experimental results demonstrate that the LiDAR module achieves a Multi-Object Tracking Accuracy (MOTA) of 89.03%, an Identity F1 Score (IDF1) of 89.80%, and an Identity Switch count (IDSW) as low as 5.1, demonstrating competitive performance compared with representative non-deep-learning-based approaches. Furthermore, by incorporating a fusion mechanism based on improved Dempster–Shafer (D-S) evidence theory and Covariance Intersection (CI), the system achieves further improvements in MOTA (90.33%) and IDF1 (90.82%), while the root mean square error (RMSE) of vessel size estimation decreases from 3.41 m to 1.97 m. Finally, the system outputs structured three-level tracks: AIS early-warning tracks, LiDAR-confirmed tracks, and LiDAR-AIS fused tracks. This hierarchical design not only enables beyond-visual-range (BVR) early warning but also enhances perception coverage and estimation accuracy. Full article
Show Figures

Figure 1

24 pages, 3366 KB  
Article
Study of the Optimal YOLO Visual Detector Model for Enhancing UAV Detection and Classification in Optoelectronic Channels of Sensor Fusion Systems
by Ildar Kurmashev, Vladislav Semenyuk, Alberto Lupidi, Dmitriy Alyoshin, Liliya Kurmasheva and Alessandro Cantelli-Forti
Drones 2025, 9(11), 732; https://doi.org/10.3390/drones9110732 - 23 Oct 2025
Cited by 3 | Viewed by 3054
Abstract
The rapid spread of unmanned aerial vehicles (UAVs) has created new challenges for airspace security, as drones are increasingly used for surveillance, smuggling, and potentially for attacks near critical infrastructure. A key difficulty lies in reliably distinguishing UAVs from visually similar birds in [...] Read more.
The rapid spread of unmanned aerial vehicles (UAVs) has created new challenges for airspace security, as drones are increasingly used for surveillance, smuggling, and potentially for attacks near critical infrastructure. A key difficulty lies in reliably distinguishing UAVs from visually similar birds in electro-optical surveillance channels, where complex backgrounds and visual noise often increase false alarms. To address this, we investigated recent YOLO architectures and developed an enhanced model named YOLOv12-ADBC, incorporating an adaptive hierarchical feature integration mechanism to strengthen multi-scale spatial fusion. This architectural refinement improves sensitivity to subtle inter-class differences between drones and birds. A dedicated dataset of 7291 images was used to train and evaluate five YOLO versions (v8–v12), together with the proposed YOLOv12-ADBC. Comparative experiments demonstrated that YOLOv12-ADBC achieved the best overall performance, with precision = 0.892, recall = 0.864, mAP50 = 0.881, mAP50–95 = 0.633, and per-class accuracy reaching 96.4% for drones and 80% for birds. In inference tests on three video sequences simulating realistic monitoring conditions, YOLOv12-ADBC consistently outperformed baselines, achieving a detection accuracy of 92.1–95.5% and confidence levels up to 88.6%, while maintaining real-time processing at 118–135 frames per second (FPS). These results demonstrate that YOLOv12-ADBC not only surpasses previous YOLO models but also offers strong potential as the optical module in multi-sensor fusion frameworks. Its integration with radar, RF, and acoustic channels is expected to further enhance system-level robustness, providing a practical pathway toward reliable UAV detection in modern airspace protection systems. Full article
Show Figures

Figure 1

24 pages, 3017 KB  
Article
Tree-Guided Transformer for Sensor-Based Ecological Image Feature Extraction and Multitarget Recognition in Agricultural Systems
by Yiqiang Sun, Zigang Huang, Linfeng Yang, Zihuan Wang, Mingzhuo Ruan, Jingchao Suo and Shuo Yan
Sensors 2025, 25(19), 6206; https://doi.org/10.3390/s25196206 - 7 Oct 2025
Cited by 1 | Viewed by 1123
Abstract
Farmland ecosystems present complex pest–predator co-occurrence patterns, posing significant challenges for image-based multitarget recognition and ecological modeling in sensor-driven computer vision tasks. To address these issues, this study introduces a tree-guided Transformer framework enhanced with a knowledge-augmented co-attention mechanism, enabling effective feature extraction [...] Read more.
Farmland ecosystems present complex pest–predator co-occurrence patterns, posing significant challenges for image-based multitarget recognition and ecological modeling in sensor-driven computer vision tasks. To address these issues, this study introduces a tree-guided Transformer framework enhanced with a knowledge-augmented co-attention mechanism, enabling effective feature extraction from sensor-acquired images. A hierarchical ecological taxonomy (Phylum–Family Species) guides prompt-driven semantic reasoning, while an ecological knowledge graph enriches visual representations by embedding co-occurrence priors. A multimodal dataset containing 60 pest and predator categories with annotated images and semantic descriptions was constructed for evaluation. Experimental results demonstrate that the proposed method achieves 90.4% precision, 86.7% recall, and 88.5% F1-score in image classification, along with 82.3% hierarchical accuracy. In detection tasks, it attains 91.6% precision and 86.3% mAP@50, with 80.5% co-occurrence accuracy. For hierarchical reasoning and knowledge-enhanced tasks, F1-scores reach 88.5% and 89.7%, respectively. These results highlight the framework’s strong capability in extracting structured, semantically aligned image features under real-world sensor conditions, offering an interpretable and generalizable approach for intelligent agricultural monitoring. Full article
Show Figures

Figure 1

28 pages, 14783 KB  
Article
HSSTN: A Hybrid Spectral–Structural Transformer Network for High-Fidelity Pansharpening
by Weijie Kang, Yuan Feng, Yao Ding, Hongbo Xiang, Xiaobo Liu and Yaoming Cai
Remote Sens. 2025, 17(19), 3271; https://doi.org/10.3390/rs17193271 - 23 Sep 2025
Viewed by 1555
Abstract
Pansharpening fuses multispectral (MS) and panchromatic (PAN) remote sensing images to generate outputs with high spatial resolution and spectral fidelity. Nevertheless, conventional methods relying primarily on convolutional neural networks or unimodal fusion strategies frequently fail to bridge the sensor modality gap between MS [...] Read more.
Pansharpening fuses multispectral (MS) and panchromatic (PAN) remote sensing images to generate outputs with high spatial resolution and spectral fidelity. Nevertheless, conventional methods relying primarily on convolutional neural networks or unimodal fusion strategies frequently fail to bridge the sensor modality gap between MS and PAN data. Consequently, spectral distortion and spatial degradation often occur, limiting high-precision downstream applications. To address these issues, this work proposes a Hybrid Spectral–Structural Transformer Network (HSSTN) that enhances multi-level collaboration through comprehensive modelling of spectral–structural feature complementarity. Specifically, the HSSTN implements a three-tier fusion framework. First, an asymmetric dual-stream feature extractor employs a residual block with channel attention (RBCA) in the MS branch to strengthen spectral representation, while a Transformer architecture in the PAN branch extracts high-frequency spatial details, thereby reducing modality discrepancy at the input stage. Subsequently, a target-driven hierarchical fusion network utilises progressive crossmodal attention across scales, ranging from local textures to multi-scale structures, to enable efficient spectral–structural aggregation. Finally, a novel collaborative optimisation loss function preserves spectral integrity while enhancing structural details. Comprehensive experiments conducted on QuickBird, GaoFen-2, and WorldView-3 datasets demonstrate that HSSTN outperforms existing methods in both quantitative metrics and visual quality. Consequently, the resulting images exhibit sharper details and fewer spectral artefacts, showcasing significant advantages in high-fidelity remote sensing image fusion. Full article
(This article belongs to the Special Issue Artificial Intelligence in Hyperspectral Remote Sensing Data Analysis)
Show Figures

Figure 1

34 pages, 11523 KB  
Article
Hand Kinematic Model Construction Based on Tracking Landmarks
by Yiyang Dong and Shahram Payandeh
Appl. Sci. 2025, 15(16), 8921; https://doi.org/10.3390/app15168921 - 13 Aug 2025
Cited by 3 | Viewed by 2823
Abstract
Visual body-tracking techniques have seen widespread adoption in applications such as motion analysis, human–machine interaction, tele-robotics and extended reality (XR). These systems typically provide 2D landmark coordinates corresponding to key limb positions. However, to construct a meaningful 3D kinematic model for body joint [...] Read more.
Visual body-tracking techniques have seen widespread adoption in applications such as motion analysis, human–machine interaction, tele-robotics and extended reality (XR). These systems typically provide 2D landmark coordinates corresponding to key limb positions. However, to construct a meaningful 3D kinematic model for body joint reconstruction, a mapping must be established between these visual landmarks and the underlying joint parameters of individual body parts. This paper presents a method for constructing a 3D kinematic model of the human hand using calibrated 2D landmark-tracking data augmented with depth information. The proposed approach builds a hierarchical model in which the palm serves as the root coordinate frame, and finger landmarks are used to compute both forward and inverse kinematic solutions. Through step-by-step examples, we demonstrate how measured hand landmark coordinates are used to define the palm reference frame and solve for joint angles for each finger. These solutions are then used in a visualization framework to qualitatively assess the accuracy of the reconstructed hand motion. As a future work, the proposed model offers a foundation for model-based hand kinematic estimation and has utility in scenarios involving occlusion or missing data. In such cases, the hierarchical structure and kinematic solutions can be used as generative priors in an optimization framework to estimate unobserved landmark positions and joint configurations. The novelty of this work lies in its model-based approach using real sensor data, without relying on wearable devices or synthetic assumptions. Although current validation is qualitative, the framework provides a foundation for future robust estimation under occlusion or sensor noise. It may also serve as a generative prior for optimization-based methods and be quantitatively compared with joint measurements from wearable motion-capture systems. Full article
(This article belongs to the Special Issue Human Activity Recognition (HAR) in Healthcare, 3rd Edition)
Show Figures

Figure 1

Back to TopTop