Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,474)

Search Parameters:
Keywords = guided fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
33 pages, 4831 KB  
Article
TCSNet: A Thin-Cloud-Sensitive Network for Hyperspectral Remote Sensing Images via Spectral-Spatial Feature Fusion
by Yuanyuan Jia, Siwei Zhao, Xuanbin Liu and Yinnian Liu
Remote Sens. 2026, 18(9), 1326; https://doi.org/10.3390/rs18091326 (registering DOI) - 26 Apr 2026
Abstract
Cloud detection is essential for quantitative land-surface remote sensing and cloud-climate research. However, existing methods often prioritize spatial features over spectral features, which limits thin-cloud detection. To address this issue, this paper proposes a Thin-Cloud-Sensitive Network (TCSNet) for hyperspectral imagery. TCSNet employs an [...] Read more.
Cloud detection is essential for quantitative land-surface remote sensing and cloud-climate research. However, existing methods often prioritize spatial features over spectral features, which limits thin-cloud detection. To address this issue, this paper proposes a Thin-Cloud-Sensitive Network (TCSNet) for hyperspectral imagery. TCSNet employs an encoder–decoder architecture with a dual-branch design: a convolutional neural network (CNN) extracts multi-scale local features, while a PVTv2-B2 Transformer captures long-range spectral dependencies. To effectively integrate the complementary representations from both branches, a Cross-Modal Fusion (CMF) module with a lightweight single-channel gate is introduced at each stage, followed by a channel attention mechanism (SE) for feature recalibration. Subsequently, a Multi-Scale Fusion (MSF) module is used to integrate multi-level features through a top-down pathway, enabling deep semantic information to guide shallow feature expression. Furthermore, to enhance the decoder’s feature representation capability, a Combined Attention Mechanism (CAM) is incorporated at each decoder stage. This design enables the network to simultaneously focus on important channels, salient regions, and cloud boundaries, effectively alleviating spectral confusion between thin clouds and the underlying surface. Experimental results on Gaofen-5 01 hyperspectral data demonstrate that TCSNet achieves the highest recall (92.98%), Recallthin (85.59%), and Recallthick (99.75%), thereby validating its superiority for thin-cloud detection. Full article
(This article belongs to the Special Issue Artificial Intelligence in Hyperspectral Remote Sensing Data Analysis)
42 pages, 16476 KB  
Article
PIMSEL: A Physically Guided Multi-Modal Semi-Supervised Learning Framework for Earthquake-Induced Landslide Reactivation Risk Assessment
by Bingxin Shi, Hongmei Guo, Zongheng He, Shi Chen, Jia Guo, Yunxi Dong, Bingyang Shi, Jingren Zhou, Yusen He and Huajin Li
Remote Sens. 2026, 18(9), 1320; https://doi.org/10.3390/rs18091320 (registering DOI) - 25 Apr 2026
Abstract
Earthquake-induced landslide reactivation poses a sustained hazard for years following major seismic events, yet operational prediction remains constrained by heterogeneous multi-modal data, sparse supervision, and the absence of uncertainty-aware frameworks. This paper presents PIMSEL, a physically guided multi-modal semi-supervised framework for post-seismic landslide [...] Read more.
Earthquake-induced landslide reactivation poses a sustained hazard for years following major seismic events, yet operational prediction remains constrained by heterogeneous multi-modal data, sparse supervision, and the absence of uncertainty-aware frameworks. This paper presents PIMSEL, a physically guided multi-modal semi-supervised framework for post-seismic landslide reactivation risk assessment. PIMSEL integrates satellite-derived morphological features, precipitation time series, and seismic hazard attributes through four components: entropy-regularized optimal transport for cross-modal semantic alignment without paired supervision; causally constrained hierarchical fusion enforcing domain-consistent modal weighting; scenario-based prototype mutation for semi-supervised learning from sparse expert annotations; and prototype-anchored variational graph clustering that simultaneously stratifies landslides into HIGH, MEDIUM, and LOW risk tiers and produces decomposed aleatoric and epistemic uncertainty estimates for operational triage. The HIGH risk tier operationally corresponds to predicted reactivation, validated against 598 documented reactivation events across 7482 co-seismic landslides from three Sichuan Province earthquake sequences: the 2013 Lushan (Mw 7.0), 2017 Jiuzhaigou (Mw 7.0), and 2022 Luding (Mw 6.8) events. PIMSEL achieves 82.5% reactivation recall and 66.4% precision, outperforming twelve baselines across clustering quality, classification, and uncertainty calibration metrics. Ablation studies confirm that optimal transport alignment contributes the largest individual performance gain. Current limitations include quarterly assessment frequency and dependence on optical imagery under cloud cover, which future integration of real-time meteorological triggers and SAR data should address. Full article
24 pages, 8285 KB  
Article
Regional Short-Term PV Power Forecasting Based on Graph Convolution and Transformer Networks
by Qinggui Chen, Ziqi Liu and Zhao Zhen
Electronics 2026, 15(9), 1817; https://doi.org/10.3390/electronics15091817 - 24 Apr 2026
Abstract
Accurate short-term photovoltaic (PV) power forecasting is essential for power system scheduling and market operations. Existing studies have shown the value of numerical weather prediction (NWP), graph-based spatial modeling, and temporal sequence learning, but the boundary of their contributions remains fragmented across many [...] Read more.
Accurate short-term photovoltaic (PV) power forecasting is essential for power system scheduling and market operations. Existing studies have shown the value of numerical weather prediction (NWP), graph-based spatial modeling, and temporal sequence learning, but the boundary of their contributions remains fragmented across many practical forecasting frameworks. In particular, adjacent multi-point NWP information is often not explicitly organized according to its spatial relationships, while historical similar-day power is rarely integrated with graph-structured meteorological features in a unified model. To address this gap, this study develops a short-term PV power forecasting framework that combines multi-point NWP graph construction with similar-day-guided Transformer fusion. First, predicted irradiance from the target site and neighboring NWP points is organized as a graph, and a Graph Convolutional Network (GCN) is used to extract local spatial meteorological features. Second, similar days are identified through a two-stage selection strategy based on Euclidean distance and Pearson correlation, and the corresponding historical power sequences are aggregated as temporal guidance. Finally, the graph-extracted NWP features, similar-day power, and predicted humidity are fused by a Transformer-based temporal modeling module to generate day-ahead PV power forecasts. Experimental results show that the proposed framework outperforms TCN-Transformer, Transformer, GCN, LSTM, and BP on the studied dataset, and maintains favorable performance on additional PV stations. These results indicate that the joint integration of graph-structured multi-point NWP information and historical similar-day power is effective for short-term PV power forecasting. Full article
(This article belongs to the Special Issue AI Applications for Smart Grid: 2nd Edition)
Show Figures

Figure 1

19 pages, 1479 KB  
Article
Reward-Guided Dynamic Fusion and Modality Decoupling for Enhanced Multimodal Sentiment Analysis
by He Zhang, Zichen Gao, Qi Yan, Yu Gu, Shuang Wang, Linsong Liu and Dequan An
Electronics 2026, 15(9), 1813; https://doi.org/10.3390/electronics15091813 - 24 Apr 2026
Abstract
Multimodal Sentiment Analysis (MSA) integrates multiple modalities to better understand human emotions. However, existing methods often neglect heterogeneity among modal features, causing redundancy and inconsistencies. Additionally, the dynamic interplay between modalities is frequently ignored during fusion, limiting performance. To address these issues, we [...] Read more.
Multimodal Sentiment Analysis (MSA) integrates multiple modalities to better understand human emotions. However, existing methods often neglect heterogeneity among modal features, causing redundancy and inconsistencies. Additionally, the dynamic interplay between modalities is frequently ignored during fusion, limiting performance. To address these issues, we propose Reward-Guided Dynamic Fusion and Modality Decoupling (RDFD). RDFD includes two key components: (1) a feature decoupling module that separates modality-specific and modality-shared features, reducing redundancy and conflicts; (2) a Reward-Guided Dynamic Fusion module that adaptively selects guiding modalities to enhance modality-specific representations and enable flexible fusion. Experiments on the CMU-MOSI and CMU-MOSEI datasets show that RDFD achieves state-of-the-art performance, demonstrating its effectiveness in advancing Multimodal Sentiment Analysis. Full article
(This article belongs to the Section Computer Science & Engineering)
30 pages, 5777 KB  
Article
CADF-Net: A Conflict-Aware Adaptive Distillation Network for Fusing Multi-Source Land-Cover Products for Key Vegetation Classes in Cross-Border Regions
by Yubo Zhang, Long Fu, Zehong Li, Yuanyuan Yang, Hongbing Chen and Shuwen Zhang
Remote Sens. 2026, 18(9), 1294; https://doi.org/10.3390/rs18091294 - 24 Apr 2026
Abstract
Cross-border regions often exhibit complex vegetation-related land-cover patterns due to contrasting natural conditions and divergent development trajectories, causing multi-source land-cover products to suffer from disagreements in class assignment and boundary delineation, especially for cropland, forestland, and grassland. Because border zones are rarely mapping [...] Read more.
Cross-border regions often exhibit complex vegetation-related land-cover patterns due to contrasting natural conditions and divergent development trajectories, causing multi-source land-cover products to suffer from disagreements in class assignment and boundary delineation, especially for cropland, forestland, and grassland. Because border zones are rarely mapping priorities, classification instability near national boundaries undermines transboundary comparisons. To address this, we propose a Conflict-aware Adaptive Distillation Fusion Network (CADF-Net) that fuses multi-source land-cover products to improve the discrimination and spatial consistency of key vegetation classes in cross-border regions. Taking the transnational China–Russia border (Sanjiang Plain and Primorskiy Kray) as a representative case, we integrate geo-environmental factors and introduce a pixel-level Conflict Index (CI) to explicitly steer the model toward discrepancy-prone areas. Building on this, we develop an Adaptive Distillation U-Net (AD-UNet) with uncertainty-adaptive distillation and employ a confidence-guided, dynamically weighted ensemble to generate the final fused land-cover product (CADF-LC). Quantitative assessments demonstrate that CADF-LC achieved an OA of 0.8600, a Kappa of 0.8133, and an mIoU of 0.7589, outperforming all input land-cover products. Compared with the strongest input product, Esri Land Cover, CADF-LC improved OA by 0.0150 and mIoU by 0.0222. Furthermore, it effectively mitigates the trade-off between detail loss and morphological fragmentation. Ultimately, CADF-Net enhances classification stability for key vegetation classes, offering a reliable foundation for transboundary ecological monitoring and land management. Full article
(This article belongs to the Special Issue Advanced AI Technology for Remote Sensing Analysis (Second Edition))
Show Figures

Figure 1

8 pages, 1931 KB  
Proceeding Paper
Maze Navigating Robot Using Lucas–Kanade Optical Flow with Coarse-to-Fine Method
by Hannah Mae Antaran and Cyrel O. Manlises
Eng. Proc. 2026, 134(1), 81; https://doi.org/10.3390/engproc2026134081 (registering DOI) - 23 Apr 2026
Abstract
We applied the Lucas–Kanade optical flow method combined with a coarse-to-fine approach for robot navigation. While Lucas–Kanade is widely used for flow estimation and tracking, its utilization in robot navigation remains limited. Using a Raspberry Pi 5 (8 gigabytes) and a Logitech webcam, [...] Read more.
We applied the Lucas–Kanade optical flow method combined with a coarse-to-fine approach for robot navigation. While Lucas–Kanade is widely used for flow estimation and tracking, its utilization in robot navigation remains limited. Using a Raspberry Pi 5 (8 gigabytes) and a Logitech webcam, a mobile robot was developed that processes optical flow vectors to guide navigation decisions aimed at exiting a maze. While most maze navigation research relies on sensor fusion, we adopted computer vision to achieve collision-free navigation. The coarse-to-fine method effectively addresses the challenge of processing large motions inherent in Lucas–Kanade, resulting in an 80% success rate and 67% recovery rate. Simple linear regression analysis results revealed a negative correlation between optical flow magnitude and the robot’s distance to the nearest obstacle, indicating that closer obstacles correspond to higher flow magnitudes. The results highlight the potential of low-cost, vision-based autonomous navigation systems that eliminate the need for complex sensor arrays, making them suitable for cost-sensitive applications. The demonstrated effectiveness of the coarse-to-fine Lucas–Kanade method in handling large motion suggests its broader applicability in real-time robotic navigation, including autonomous vehicles and service robots operating in challenging or resource-limited environments. Full article
Show Figures

Figure 1

28 pages, 10821 KB  
Article
RadarsBEV: A Joint Multi-Radar Fusion and Target Detection Network via Gaussian Attention in Arbitrary Configurations
by Zuyuan Guo, Wujun Li, Guoxin Zhang, Hongfu Li, Jiesong He, Kah Chan Teh and Wei Yi
Remote Sens. 2026, 18(9), 1290; https://doi.org/10.3390/rs18091290 - 23 Apr 2026
Viewed by 181
Abstract
Multi-radar fusion is fundamental for robust, all-weather perception for diverse applications. However, current fusion paradigms face structural and computational bottlenecks. Traditional statistical frameworks suffer from an explosion of dimensional calculation, where computational complexity scales with the number of active sensor nodes. Concurrently, existing [...] Read more.
Multi-radar fusion is fundamental for robust, all-weather perception for diverse applications. However, current fusion paradigms face structural and computational bottlenecks. Traditional statistical frameworks suffer from an explosion of dimensional calculation, where computational complexity scales with the number of active sensor nodes. Concurrently, existing statistical and deep learning fusion models exhibit systemic brittleness; their rigid topological binding to predefined sensor counts leads to a drop in performance during sensor dropouts. Furthermore, generic attention mechanisms suffer a phenomenological mismatch with radar signals, neglecting the spatial features of radar targets and leading to false alarms. To overcome these limitations, we propose RadarsBEV, a scalable end-to-end multi-radar detection framework. By decoupling per-sensor feature extraction from the central spatial fusion process, RadarsBEV achieves permutation invariance. This design breaks the scalability limit and enables graceful degradation utilizing residual nodes without system downtime. Crucially, we introduce a physics-aware Gaussian cross-attention mechanism. By guiding sparse feature sampling through predicted two-dimensional Gaussian target geometry, this mechanism decouples attention weights from clutter signal. Extensive experiments on high-fidelity simulations and real-world datasets demonstrate that RadarsBEV achieves better detection performance. Notably, the framework exhibits robust configuration zero-shot generalization, adapting to entirely unseen spatial layouts and degraded operational environments without fine-tuning. Full article
Show Figures

Figure 1

18 pages, 1855 KB  
Article
Mechanisms of Microstructural and Defect Evolution in Laser Powder Bed Fusion-Fabricated In625 Induced by Heat Treatment
by Qing Chen, Yi Liu, Xuxing Duan, Xianjun Zhang, Gening He, Yu Sun and Changyuan Li
Materials 2026, 19(9), 1713; https://doi.org/10.3390/ma19091713 - 23 Apr 2026
Viewed by 84
Abstract
Heat treatment is essential for In625 fabricated by laser powder bed fusion (L-PBF), as it significantly influences microstructural evolution, defect behavior, and mechanical performance. In this study, the effects of different solution heat treatments on L-PBF-fabricated In625 were systematically investigated. Industrial computed tomography [...] Read more.
Heat treatment is essential for In625 fabricated by laser powder bed fusion (L-PBF), as it significantly influences microstructural evolution, defect behavior, and mechanical performance. In this study, the effects of different solution heat treatments on L-PBF-fabricated In625 were systematically investigated. Industrial computed tomography was employed to characterize internal defects before and after heat treatment, while optical microscopy, EBSD, TEM, and EDS were used to analyze microstructural evolution. Room-temperature tensile tests evaluated mechanical properties. The results show that heat treatment at 1090 °C reduces porosity from 0.33% to 0.25%, whereas increasing the temperature to 1150 °C results in a further increase in porosity to 0.45%. This non-monotonic behavior is interpreted as the result of competing mechanisms, including partial closure of small pores at 1090 °C and pore coarsening/enlargement at higher temperatures, with the latter possibly involving the growth of sub-resolution pores into the CT-detectable range. Complete grain equiaxiality occurs after heat treatment at 1090 °C or higher, with average grain sizes below 100 μm, although grain coarsening becomes pronounced at higher temperatures. Samples heat-treated at 1150 °C exhibit reduced mechanical anisotropy, achieving tensile strength above 919 MPa and elongation up to 60%. These results clarify the mechanisms by which heat treatment governs microstructure–defect–property relationships in L-PBF In625, guiding its engineering application. Full article
(This article belongs to the Section Metals and Alloys)
23 pages, 8014 KB  
Article
MSW-Mamba-Det: Multi-Scale Windowed State-Space Modeling for End-to-End Defect Detection in Photovoltaic Module Electroluminescence Images
by Xiaofeng Wang, Haojie Hu, Xiao Hao and Weiguang Ma
Sensors 2026, 26(9), 2616; https://doi.org/10.3390/s26092616 - 23 Apr 2026
Viewed by 406
Abstract
Electroluminescence (EL) imaging is widely used for photovoltaic (PV) module inspection, yet EL defect detection remains challenging due to the need for high-resolution inputs, low-contrast defects, and strong structured background patterns. To address these issues, we propose MSW-Mamba-Det, an end-to-end defect detection framework [...] Read more.
Electroluminescence (EL) imaging is widely used for photovoltaic (PV) module inspection, yet EL defect detection remains challenging due to the need for high-resolution inputs, low-contrast defects, and strong structured background patterns. To address these issues, we propose MSW-Mamba-Det, an end-to-end defect detection framework built on RT-DETR, comprising three components. (1) MSW-Mamba, a multi-scale windowed state-space module, adopts a Local/Stripe/Grid architecture to jointly model fine details and long-range dependencies; the Stripe branch strengthens directional continuity for elongated defects, while the Grid branch introduces coarse global context to improve cross-region consistency. Saliency- and gradient-guided gating is further used to suppress background-induced false responses. (2) DetailAware compensates for detail attenuation by restoring high-frequency textures and edges through multi-scale local enhancement, and applies pixel-wise adaptive gating to integrate global semantics and mitigate smoothing effects in deep representations. (3) PAFB (Pyramid Attention Fusion Block) aligns adjacent-scale features and improves multi-scale fusion, enhancing localization stability across defect sizes. Experiments on two public EL datasets show that MSW-Mamba-Det achieves AP50:95 of 60.4% on PV-Multi-Defect-main and 68.0% on PVEL-AD, improving over RT-DETR by 2.5 points (from 57.9% to 60.4%) and 2.2 points (from 65.8% to 68.0%), respectively. MSW-Mamba-Det also outperforms 12 representative baselines, including CNN-, Transformer-, and recent YOLO-based models, in AP50:95 on both datasets, with particularly strong performance on medium and large defects. These results demonstrate the effectiveness of the proposed modules for robust PV EL defect inspection under low-contrast and structured-background conditions. Full article
(This article belongs to the Section Sensing and Imaging)
33 pages, 2381 KB  
Article
Spatiotemporal Evolution and Nonlinear Effects of Urban Morphology on Land Surface Temperature in the Context of Heatwaves
by Ling Li and Mingyi Du
Appl. Sci. 2026, 16(9), 4150; https://doi.org/10.3390/app16094150 - 23 Apr 2026
Viewed by 67
Abstract
Frequent extreme heatwaves (HWs) have significantly exacerbated urban thermal risks, yet the regulatory mechanisms of urban morphology remain poorly understood. This study focuses on the core urban areas of Beijing and develops a Local Climate Zone (LCZ)-constrained spatiotemporal data fusion model (LCZ-FSDAF) to [...] Read more.
Frequent extreme heatwaves (HWs) have significantly exacerbated urban thermal risks, yet the regulatory mechanisms of urban morphology remain poorly understood. This study focuses on the core urban areas of Beijing and develops a Local Climate Zone (LCZ)-constrained spatiotemporal data fusion model (LCZ-FSDAF) to generate high-resolution Land Surface Temperature (LST) datasets from 2015 to 2024. By integrating urban–rural gradient analysis with the XGBoost-SHAP model, this study quantitatively resolves the spatiotemporal evolution of land surface temperature during heatwaves and the nonlinear threshold effects of urban morphological parameters, using a representative extreme heatwave event in July 2023 as a case study. The results indicate that the LCZ-FSDAF model achieves high precision across complex urban underlying surfaces (up to 0.946, RMSE as low as 0.762 K), effectively capturing the spatial heterogeneity of the urban thermal environment. Over the past decade, heatwave events in Beijing have exhibited a significant trend of increasing frequency, duration, and intensity. During these events, LST displays a concentric core-high, periphery-low structure; however, the peak temperature shifts toward high-density built-up areas in the sub-core, manifesting a distinct heat island core shift phenomenon. Furthermore, the impact of urban morphology on LST is characterized by significant nonlinearity, with the Normalized Difference Vegetation Index (NDVI) and Mean Building Height (MBH) identified as dominant factors. Notably, Building Coverage (BC) and Sky View Factor (SVF) exhibit pronounced threshold effects across different thermal indicators. Findings of this study are useful for guiding urban planning, optimizing spatial configurations, formulating urban heat island mitigation policies under heatwaves, and promoting the Sustainable Development Goals (SDGs) of cities and communities. Full article
45 pages, 3192 KB  
Review
Exploring Artificial Intelligence in Orthopedic Surgery: A Review of Perception, Decision, and Execution Systems
by Dehan Li, Wanshi Liu, Md. Mihraz Hossain Niloy, Zhang Yi and Lei Xu
Sensors 2026, 26(9), 2591; https://doi.org/10.3390/s26092591 - 22 Apr 2026
Viewed by 283
Abstract
Artificial intelligence (AI) has become an indispensable tool in orthopedic surgery. It provides new methods to increase surgical precision, improve patient safety, and support personalized treatment plans. This review presents a comprehensive analysis of AI-assisted orthopedic surgery across three core domains. Based on [...] Read more.
Artificial intelligence (AI) has become an indispensable tool in orthopedic surgery. It provides new methods to increase surgical precision, improve patient safety, and support personalized treatment plans. This review presents a comprehensive analysis of AI-assisted orthopedic surgery across three core domains. Based on 89 recent studies, this review organizes findings around a perception–decision–execution framework. It groups diverse AI applications into certain categories while highlighting the mutuality across domains. Perception systems have progressed from basic CNN-based segmentation models to advanced transformer architectures. They support multi-modal data fusion and enable uncertainty quantification. Decision systems have moved far beyond rigid rule-based methods and evolve into data-driven models that support surgical planning, accurate risk prediction and continuous outcome optimization. And execution systems have advanced from passive navigation tools to active robotic assistance systems with real-time adaptive capabilities. Beyond mapping technological advances, this review also identifies pivotal challenges that hinder clinical translation and concludes with a clear roadmap for future research, which marks closed-loop surgical assistance systems as the next key development direction. Building on these findings, this review illuminates the potential of AI-assisted orthopedic surgery and guides future research toward innovations that can be translated into clinical practice. Full article
(This article belongs to the Section Biomedical Sensors)
25 pages, 19124 KB  
Article
Multi-Scale Fractional-Order Image Fusion Algorithm Based on Polarization Spectral Images
by Zhenduo Zhang, Xueying Cao and Zhen Wang
Appl. Sci. 2026, 16(9), 4087; https://doi.org/10.3390/app16094087 - 22 Apr 2026
Viewed by 84
Abstract
With the continuous advancement of polarization spectral sensing technology, multi-band polarization image fusion has emerged as a novel approach to image fusion. By integrating spectral and polarization information, this method overcomes the limitations of relying on a single information source and significantly improves [...] Read more.
With the continuous advancement of polarization spectral sensing technology, multi-band polarization image fusion has emerged as a novel approach to image fusion. By integrating spectral and polarization information, this method overcomes the limitations of relying on a single information source and significantly improves overall image quality. To address this, this paper proposes a new polarization spectral fusion algorithm. First, feature matching is employed to achieve pixel-level spatial alignment of multi-band polarization images. Then, a fusion strategy based on multi-scale decomposition and singular value decomposition is adopted to preserve structural information and fine details. Subsequently, fractional-order processing and guided filtering are applied to enhance details and suppress noise. Finally, a progressive reconstruction from low to high scales is performed to ensure hierarchical consistency and information integrity throughout the fusion process. In addition, spectral information is utilized for color restoration, enabling the final image to achieve high spatial resolution while maintaining natural and rich color representation.Experimental results demonstrate that the proposed method effectively integrates features from different spectral bands and polarization information while preserving maximum similarity, leading to significant improvements in both image quality and detail representation. Full article
34 pages, 1939 KB  
Article
AutoUAVFormer: Neural Architecture Search with Implicit Super-Resolution for Real-Time UAV Aerial Object Detection
by Li Pan, Huiyao Wan, Pazlat Nurmamat, Jie Chen, Long Sun, Yice Cao, Shuai Wang, Yingsong Li and Zhixiang Huang
Remote Sens. 2026, 18(9), 1268; https://doi.org/10.3390/rs18091268 - 22 Apr 2026
Viewed by 114
Abstract
The widespread deployment of unmanned aerial vehicles (UAVs) in civil and commercial airspace has raised significant safety concerns, driving the demand for reliable and real-time Anti-UAV visual detection systems. However, existing deep learning-based detectors face substantial challenges in complex low-altitude environments, including drastic [...] Read more.
The widespread deployment of unmanned aerial vehicles (UAVs) in civil and commercial airspace has raised significant safety concerns, driving the demand for reliable and real-time Anti-UAV visual detection systems. However, existing deep learning-based detectors face substantial challenges in complex low-altitude environments, including drastic scale variations, severe background clutter, and weak feature representation of small UAV targets. Moreover, handcrafted Transformer-based architectures often lack adaptability across diverse scenarios and struggle to balance detection accuracy with computational efficiency. To address these limitations, this paper proposes AutoUAVFormer, a super-resolution guided neural architecture search framework for Anti-UAV detection. In contrast to conventional manually designed approaches, AutoUAVFormer leverages joint optimization of a Transformer-based detection objective and a super-resolution reconstruction objective to automatically identify a task-specific optimal network architecture for detecting UAV targets. Specifically, a unified search space is formulated by jointly embedding Transformer hyperparameters and Feature Pyramid Network (FPN) structures, facilitating end-to-end co-optimization of multi-scale feature fusion and global context modeling. To efficiently locate architectures that balance accuracy and computational cost, a three-stage pipeline, combining supernetwork training with evolutionary search, is employed. Additionally, we design a super-resolution auxiliary branch that operates only during training to enhance the model’s ability to learn fine-grained textures and sharpen edge representations of small targets, without introducing any inference overhead. Extensive experiments on three challenging Anti-UAV detection benchmarks, namely DetFly, DUT Anti-UAV, and UAV Swarm, confirm the superiority of AutoUAVFormer over current state-of-the-art methods, with mAP@0.5 scores reaching 98.6%, 95.5%, and 89.9% on the respective datasets while sustaining real-time inference speed. These results demonstrate that AutoUAVFormer achieves strong generalization and maintains robust Anti-UAV detection performance under challenging low-altitude conditions. Full article
25 pages, 2360 KB  
Article
ACF-YOLO: Feature Enhancement and Multi-Scale Alignment for Sustainable Crop Small Object Detection
by Chuanxiang Li, Yihang Li, Wenzhong Yang and Danny Chen
Sustainability 2026, 18(9), 4168; https://doi.org/10.3390/su18094168 - 22 Apr 2026
Viewed by 150
Abstract
Sustainable precision agriculture is crucial for optimizing resource utilization, reducing chemical inputs, and ensuring global food security. High-precision automatic recognition and monitoring of key crop organs (e.g., wheat heads and flower clusters) serve as the technological foundation for sustainable agricultural management decisions. However, [...] Read more.
Sustainable precision agriculture is crucial for optimizing resource utilization, reducing chemical inputs, and ensuring global food security. High-precision automatic recognition and monitoring of key crop organs (e.g., wheat heads and flower clusters) serve as the technological foundation for sustainable agricultural management decisions. However, visual perception in natural field environments is highly susceptible to external conditions. To address the challenges of severe background interference and feature dilution in crop small object detection within complex agricultural scenarios, this paper proposes an enhanced detection network, ACF-YOLO, based on YOLO11. First, an Aggregated Multi-scale Local-Global Attention (AMLGA) module is designed to enhance the feature representation of weak targets by fusing local details with global semantics. Second, a Context-Guided Fusion Module (CGFM) and a Soft-Neighbor Interpolation (SNI) strategy are introduced. Their synergy alleviates feature aliasing effects and ensures the precise alignment of deep semantic information with shallow spatial details. Furthermore, the Inner-MPDIoU loss function is employed to optimize the bounding box regression accuracy for non-rigid targets by incorporating geometric constraints and auxiliary scale factors. To verify the detection capability of the proposed method, we constructed a UAV Wheat Head Dataset (UWHD) and conducted extensive experiments on the UWHD, GWHD2021, and RFRB datasets. The experimental results demonstrate that ACF-YOLO outperforms other comparative methods, confirming its stable detection performance and contributing to the sustainable development of agriculture. Full article
(This article belongs to the Section Sustainable Agriculture)
23 pages, 2414 KB  
Article
Semantic-Guided Multi-Level Collaborative Fusion Network for Visible and Infrared Images
by Lijun Yuan, Chuanjiang Xie, Ming Yang, Xiaoguang Tu, Qiqin Li and Xinyu Zhu
Sensors 2026, 26(9), 2577; https://doi.org/10.3390/s26092577 - 22 Apr 2026
Viewed by 111
Abstract
The paramount value of image fusion is manifested in effectively enhancing downstream tasks. However, compatibility with subsequent tasks is compromised due to the semantic deficiency of fusion representations generated by current approaches. To mitigate this limitation, a semantic-guided multi-level collaborative fusion network is [...] Read more.
The paramount value of image fusion is manifested in effectively enhancing downstream tasks. However, compatibility with subsequent tasks is compromised due to the semantic deficiency of fusion representations generated by current approaches. To mitigate this limitation, a semantic-guided multi-level collaborative fusion network is proposed, termed DSIFuse. By leveraging semantic priors and global context extracted from auxiliary segmentation branches, a multi-level interaction space is constructed to explicitly refine cross-modal features. Specifically, a cross-modal feature correction mechanism is designed to enhance semantic alignment by injecting complementary visible–infrared information at each layer, while a three-level interaction strategy gradually integrates unimodal features and semantic maps to generate semantically enriched representations. To mitigate semantic information loss during image reconstruction, a semantic compensation block is employed, incorporating interactive representations from prior layers and global semantic maps into the multi-scale decoder. Finally, the overall loss integrates semantic supervision, gradient, and intensity loss. Experiments conducted on public datasets indicate that clear fusion images are generated by DSIFuse, with improved structural consistency and reduced artifacts. Under a unified benchmark, the fused representations subsequently yield improved performance in downstream object detection tasks. Full article
(This article belongs to the Section Sensing and Imaging)
Back to TopTop