Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,062)

Search Parameters:
Keywords = feature-adaptive fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 5247 KB  
Article
Audiovisual Brain Activity Recognition Based on Symmetric Spatio-Temporal–Frequency Feature Association Vectors
by Yang Xi, Lu Zhang, Chenxue Wu, Bingjie Shi and Cunzhen Li
Symmetry 2025, 17(12), 2175; https://doi.org/10.3390/sym17122175 - 17 Dec 2025
Abstract
The neural mechanisms of auditory and visual processing are not only a core research focus in cognitive neuroscience but also hold critical importance for the development of brain–computer interfaces, neurological disease diagnosis, and human–computer interaction technologies. However, EEG-based studies on classifying auditory and [...] Read more.
The neural mechanisms of auditory and visual processing are not only a core research focus in cognitive neuroscience but also hold critical importance for the development of brain–computer interfaces, neurological disease diagnosis, and human–computer interaction technologies. However, EEG-based studies on classifying auditory and visual brain activities largely overlook the in-depth utilization of spatial distribution patterns and frequency-specific characteristics inherent in such activities. This paper proposes an analytical framework that constructs symmetrical spatio-temporal–frequency feature association vectors to represent brain activities by computing EEG microstates across multiple frequency bands and brain functional connectivity networks. Then we construct an Adaptive Tensor Fusion Network (ATFN) that leverages feature association vectors to recognize brain activities related to auditory, visual, and audiovisual processing. The ATFN includes a feature fusion and selection module based on differential feature enhancement, a feature encoding module enhanced with attention mechanisms, and a classifier based on a multilayer perceptron to achieve the efficient recognition of audiovisual brain activities. The feature association vectors are then processed by the Adaptive Tensor Fusion Network (ATFN) to efficiently recognize different types of audiovisual brain activities. The results show that the classification accuracy for auditory, visual, and audiovisual brain activity reaches 96.97% using the ATFN, demonstrating that the proposed symmetric spatio-temporal–frequency feature association vectors effectively characterize visual, auditory, and audiovisual brain activities. The symmetrical spatio-temporal–frequency feature association vectors establish a computable mapping that captures the intrinsic correlations among temporal, spatial, and frequency features, offering a more interpretable method to represent brain activities. The proposed ATFN provides an effective recognition framework for brain activity, with a potential application for brain–computer interfaces and neurological disease diagnosis. Full article
23 pages, 9243 KB  
Article
Asymmetric Spatial–Frequency Fusion Network for Infrared and Visible Object Detection
by Jing Liu, Jing Gao, Xiaoyong Liu, Junjie Tao, Jun Ma, Chaoping Guo, Peijun Shi and Pan Li
Symmetry 2025, 17(12), 2174; https://doi.org/10.3390/sym17122174 - 17 Dec 2025
Abstract
Infrared and visible image fusion-based object detection is critical for robust environmental perception under adverse conditions, yet existing methods still suffer from insufficient modeling of modality discrepancies and limited adaptivity in their fusion mechanisms. This work proposes an asymmetric spatial–frequency fusion network, AsyFusionNet. [...] Read more.
Infrared and visible image fusion-based object detection is critical for robust environmental perception under adverse conditions, yet existing methods still suffer from insufficient modeling of modality discrepancies and limited adaptivity in their fusion mechanisms. This work proposes an asymmetric spatial–frequency fusion network, AsyFusionNet. The network adopts an asymmetric dual-branch backbone that extends the RGB branch to P5 while truncating the infrared branch at P4, thereby better aligning with the physical characteristics of the two modalities, enhancing feature complementarity, and enabling fine-grained modeling of modality differences. On top of this backbone, a local–global attention fusion (LGAF) module is introduced to model local and global attention in parallel and reorganize them through lightweight convolutions, achieving joint spatial–channel selective enhancement. Modality-specific feature enhancement is further realized via a hierarchical attention module (HAM) in the RGB branch, which employs dynamic kernel selection to emphasize multi-level texture details, and a fourier spatial spectral modulation (FS2M) module in the infrared branch, which more effectively captures global thermal radiation patterns. Extensive experiments on the M3FD and VEDAI datasets demonstrate that AsyFusionNet attains 86.3% and 54.1%mAP50, respectively, surpassing the baseline by 8.8 and 6.4 points (approximately 11.4% and 13.4% relative gains) while maintaining real-time inference speed. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

37 pages, 7082 KB  
Article
A Method for UAV Path Planning Based on G-MAPONet Reinforcement Learning
by Jian Deng, Honghai Zhang, Yuetan Zhang, Mingzhuang Hua and Yaru Sun
Drones 2025, 9(12), 871; https://doi.org/10.3390/drones9120871 - 17 Dec 2025
Abstract
To address the issues of efficiency and robustness in UAV trajectory planning under complex environments, this paper proposes a Graph Multi-Head Attention Policy Optimization Network (G-MAPONet) algorithm that integrates Graph Attention (GAT), Multi-Head Attention (MHA), and Group Relative Policy Optimization (GRPO). The algorithm [...] Read more.
To address the issues of efficiency and robustness in UAV trajectory planning under complex environments, this paper proposes a Graph Multi-Head Attention Policy Optimization Network (G-MAPONet) algorithm that integrates Graph Attention (GAT), Multi-Head Attention (MHA), and Group Relative Policy Optimization (GRPO). The algorithm adopts a three-layer architecture of “GAT layer for local feature perception–MHA for global semantic reasoning–GRPO for policy optimization”, comprehensively achieving the goals of dynamic graph convolution quantization and global adaptive parallel decoupled dynamic strategy adjustment. Comparative experiments in multi-dimensional spatial environments demonstrate that the Gat_Mha combined mechanism exhibits significant superiority compared to single attention mechanisms, which verifies the efficient representation capability of the dual-layer hybrid attention mechanism in capturing environmental features. Additionally, ablation experiments integrating Gat, Mha, and GRPO algorithms confirm that the dual-layer fusion mechanism of Gat and Mha yields better improvement effects. Finally, comparisons with traditional reinforcement learning algorithms across multiple performance metrics show that the G-MAPONet algorithm reduces the number of convergence episodes (NCE) by an average of more than 19.14%, increases the average reward (AR) by over 16.20%, and successfully completes all dynamic path planning (PPTC) tasks; meanwhile, the algorithm’s reward values and obstacle avoidance success rate are significantly higher than those of other algorithms. Compared with the baseline APF algorithm, its reward value is improved by 8.66%, and the obstacle avoidance repetition rate is also enhanced, which further verifies the effectiveness of the improved G-MAPONet algorithm. In summary, through the dual-layer complementary mode of GAT and MHA, the G-MAPONet algorithm overcomes the bottlenecks of traditional dynamic environment modeling and multi-scale optimization, enhances the decision-making capability of UAVs in unstructured environments, and provides a new technical solution for trajectory planning in intelligent logistics and distribution. Full article
23 pages, 40146 KB  
Article
Leveraging Time–Frequency Distribution Priors and Structure-Aware Adaptivity for Wideband Signal Detection and Recognition in Wireless Communications
by Xikang Wang, Hua Xu, Zisen Qi, Qingwei Meng, Hongcheng Fan, Yunhao Shi and Wenran Le
Sensors 2025, 25(24), 7650; https://doi.org/10.3390/s25247650 - 17 Dec 2025
Abstract
Wideband signal detection and recognition (WSDR) is considered an effective technical means for monitoring and analyzing spectra. The mainstream technical route involves constructing time–frequency representations for wideband sampled signals and then achieving signal detection and recognition through deep learning-based object detection models. However, [...] Read more.
Wideband signal detection and recognition (WSDR) is considered an effective technical means for monitoring and analyzing spectra. The mainstream technical route involves constructing time–frequency representations for wideband sampled signals and then achieving signal detection and recognition through deep learning-based object detection models. However, existing methods exhibit insufficient attention on the prior information contained in the time–frequency domain and the structural features of signals, leaving ample room for further exploration and optimization. In this paper, we propose a novel model called TFDP-SANet for the WSDR task, which is based on time–frequency distribution priors and structure-aware adaptivity. Initially, considering the horizontal directionality and banded structure characteristics of the signal in the time–frequency representation, we introduce both the Strip Pooling Module (SPM) and Coordinate Attention (CA) mechanism during the feature extraction and fusion stages. These components enable the model to aggregate long-distance dependencies along horizontal and vertical directions, mitigate noise interference outside local windows, and enhance focus on the spatial distributions and shape characteristics of signals. Furthermore, we adopt an adaptive elliptical Gaussian encoding strategy to generate heatmaps, which enhances the adaptability of the effective guidance region for center-point localization to the target shape. During inference, we design a Time–Frequency Clustering Optimizer (TFCO) that leverages prior information to adjust the class of predicted bounding boxes, further improving accuracy. We conduct a series of ablation experiments and comparative experiments on the WidebandSig53 (WBSig53) dataset, and the results demonstrate that our proposed method outperforms existing approaches on most metrics. Full article
20 pages, 8786 KB  
Article
Learning to Count Crowds from Low-Altitude Aerial Views via Point-Level Supervision and Feature-Adaptive Fusion
by Junzhe Mao, Lin Nai, Jinqi Bai, Chang Liu and Liangfeng Xu
Appl. Sci. 2025, 15(24), 13211; https://doi.org/10.3390/app152413211 - 17 Dec 2025
Abstract
Counting small, densely clustered objects from low-altitude aerial views is challenging due to large scale variations, complex backgrounds, and severe occlusion, which often degrade the performance of fully supervised or density-regression methods. To address these issues, we propose a weakly supervised crowd counting [...] Read more.
Counting small, densely clustered objects from low-altitude aerial views is challenging due to large scale variations, complex backgrounds, and severe occlusion, which often degrade the performance of fully supervised or density-regression methods. To address these issues, we propose a weakly supervised crowd counting framework that leverages point-level supervision and a feature-adaptive fusion strategy to enhance perception under low-altitude aerial views. The network comprises a front-end feature extractor and a back-end fusion module. The front-end adopts the first 13 convolutional layers of VGG16-BN to capture multi-scale semantic features while preserving crucial spatial details. The back-end integrates a Feature-Adaptive Fusion module and a Multi-Scale Feature Aggregation module: the former dynamically adjusts fusion weights across scales to improve robustness to scale variation, and the latter aggregates multi-scale representations to better capture targets in dense, complex scenes. Point-level annotations serve as weak supervision to substantially reduce labeling cost while enabling accurate localization of small individual instances. Experiments on several public datasets, including ShanghaiTech Part A, ShanghaiTech Part B, and UCF_CC_50, demonstrate that our method surpasses existing mainstream approaches, effectively mitigating scale variation, background clutter, and occlusion, and providing an efficient and scalable weakly supervised solution for small-object counting. Full article
Show Figures

Figure 1

18 pages, 1564 KB  
Article
Salient Object Detection in Optical Remote Sensing Images Based on Hierarchical Semantic Interaction
by Jingfan Xu, Qi Zhang, Jinwen Xing, Mingquan Zhou and Guohua Geng
J. Imaging 2025, 11(12), 453; https://doi.org/10.3390/jimaging11120453 - 17 Dec 2025
Abstract
Existing salient object detection methods for optical remote sensing images still face certain limitations due to complex background variations, significant scale discrepancies among targets, severe background interference, and diverse topological structures. On the one hand, the feature transmission process often neglects the constraints [...] Read more.
Existing salient object detection methods for optical remote sensing images still face certain limitations due to complex background variations, significant scale discrepancies among targets, severe background interference, and diverse topological structures. On the one hand, the feature transmission process often neglects the constraints and complementary effects of high-level features on low-level features, leading to insufficient feature interaction and weakened model representation. On the other hand, decoder architectures generally rely on simple cascaded structures, which fail to adequately exploit and utilize contextual information. To address these challenges, this study proposes a Hierarchical Semantic Interaction Module to enhance salient object detection performance in optical remote sensing scenarios. The module introduces foreground content modeling and a hierarchical semantic interaction mechanism within a multi-scale feature space, reinforcing the synergy and complementarity among features at different levels. This effectively highlights multi-scale and multi-type salient regions in complex backgrounds. Extensive experiments on multiple optical remote sensing datasets demonstrate the effectiveness of the proposed method. Specifically, on the EORSSD dataset, our full model integrating both CA and PA modules improves the max F-measure from 0.8826 to 0.9100 (↑2.74%), increases maxE from 0.9603 to 0.9727 (↑1.24%), and enhances the S-measure from 0.9026 to 0.9295 (↑2.69%) compared with the baseline. These results clearly demonstrate the effectiveness of the proposed modules and verify the robustness and strong generalization capability of our method in complex remote sensing scenarios. Full article
(This article belongs to the Special Issue AI-Driven Remote Sensing Image Processing and Pattern Recognition)
Show Figures

Figure 1

22 pages, 50111 KB  
Article
Kernel Adaptive Swin Transformer for Image Restoration
by Zhen Ni, Jingyu Wang, Aniruddha Bhattacharjya and Le Yan
Symmetry 2025, 17(12), 2161; https://doi.org/10.3390/sym17122161 - 15 Dec 2025
Abstract
In this modern era, attention has been devoted to blind super-resolution design, which improves image restoration performance by combining self-attention networks and explicitly introducing degradation information. This paper proposes a novel model called Kernel Adaptive Swin Transformer (KAST) to address the ill-posedness in [...] Read more.
In this modern era, attention has been devoted to blind super-resolution design, which improves image restoration performance by combining self-attention networks and explicitly introducing degradation information. This paper proposes a novel model called Kernel Adaptive Swin Transformer (KAST) to address the ill-posedness in image super-resolution and the resulting irregular difficulties in restoration, including asymmetrical degradation problems. KAST introduces four key innovations: (1) local degradation-aware modeling, (2) parallel attention-based feature fusion, (3) log-space continuous position bias, and (4) comprehensive validation on diverse datasets. The model captures degraded information in different regions of low-resolution images, effectively encodes and distinguishes these degraded features using self-attention mechanisms, and accurately restores image details. The proposed approach innovatively integrates degraded features with image features through a parallel attention fusion strategy, enhancing the network’s ability to capture pixel relationships and achieving denoising, deblurring, and high-resolution image reconstruction. Experimental results demonstrate that our model performs well on multiple datasets, effectively verifying the effectiveness of the proposed method. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

25 pages, 2546 KB  
Article
From Joint Distribution Alignment to Spatial Configuration Learning: A Multimodal Financial Governance Diagnostic Framework to Enhance Capital Market Sustainability
by Wenjuan Li, Xinghua Liu, Ziyi Li, Zulei Qin, Jinxian Dong and Shugang Li
Sustainability 2025, 17(24), 11236; https://doi.org/10.3390/su172411236 - 15 Dec 2025
Abstract
Financial fraud, as a salient manifestation of corporate governance failure, erodes investor confidence and threatens the long-term sustainability of capital markets. This study aims to develop and validate SFG-2DCNN, a multimodal deep learning framework that adopts a configurational perspective to diagnose financial fraud [...] Read more.
Financial fraud, as a salient manifestation of corporate governance failure, erodes investor confidence and threatens the long-term sustainability of capital markets. This study aims to develop and validate SFG-2DCNN, a multimodal deep learning framework that adopts a configurational perspective to diagnose financial fraud under class-imbalanced conditions and support sustainable corporate governance. Conventional diagnostic approaches struggle to capture the higher-order interactions within covert fraud patterns due to scarce fraud samples and complex multimodal signals. To overcome these limitations, SFG-2DCNN adopts a systematic two-stage mechanism. First, to ensure a logically consistent data foundation, the framework builds a domain-adaptive generative model (SMOTE-FraudGAN) that enforces joint distribution alignment to fundamentally resolve the issue of economic logic coherence in synthetic samples. Subsequently, the framework pioneers a feature topology mapping strategy that spatializes extracted multimodal covert signals, including non-traditional indicators (e.g., Total Liabilities/Operating Costs) and affective dissonance in managerial narratives, into an ordered two-dimensional matrix, enabling a two-dimensional Convolutional Neural Network (2D-CNN) to efficiently identify potential governance failure patterns through deep spatial fusion. Experiments on Chinese A-share listed firms demonstrate that SFG-2DCNN achieves an F1-score of 0.917 and an AUC of 0.942, significantly outperforming baseline models. By advancing the analytical paradigm from isolated variable assessment to holistic multimodal configurational analysis, this research provides a high-fidelity tool for strengthening sustainable corporate governance and market transparency. Full article
(This article belongs to the Section Economic and Business Aspects of Sustainability)
Show Figures

Figure 1

22 pages, 9457 KB  
Article
Enhancing Document Classification Through Multimodal Image-Text Classification: Insights from Fine-Tuned CLIP and Multimodal Deep Fusion
by Hosam Aljuhani, Mohamed Yehia Dahab and Yousef Alsenani
Sensors 2025, 25(24), 7596; https://doi.org/10.3390/s25247596 - 15 Dec 2025
Viewed by 48
Abstract
Foundation models excel on general benchmarks but often underperform in clinical settings due to domain shift between internet-scale pretraining data and medical data. Multimodal deep learning, which jointly leverages medical images and clinical text, is promising for diagnosis, yet it remains unclear whether [...] Read more.
Foundation models excel on general benchmarks but often underperform in clinical settings due to domain shift between internet-scale pretraining data and medical data. Multimodal deep learning, which jointly leverages medical images and clinical text, is promising for diagnosis, yet it remains unclear whether domain adaptation is better achieved by fine-tuning large vision–language models or by training lighter, task-specific architectures. We address this question by introducing PairDx, a balanced dataset of 22,665 image–caption pairs spanning six medical document classes, curated to reduce class imbalance and support fair, reproducible comparisons. Using PairDx, we develop and evaluate two approaches: (i) PairDxCLIP, a fine-tuned CLIP (ViT-B/32), and (ii) PairDxFusion, a custom hybrid model that combines ResNet-18 visual features and GloVe text embeddings with attention-based fusion. Both adapted models substantially outperform a zero-shot CLIP baseline (61.18% accuracy) and a specialized model, BiomedCLIP, which serves as an additional baseline and achieves 66.3% accuracy. Our fine-tuned CLIP (PairDxCLIP) attains 93% accuracy and our custom fusion model (PairDxFusion) reaches 94% accuracy on a held-out test set. Notably, PairDxFusion achieves this high accuracy with 17 min, 55 s of training time, nearly four times faster than PairDxCLIP (65 min, 52 s), highlighting a practical efficiency–performance trade-off for clinical deployment. The testing time also outperforms the specialized model—BiomedCLIP (0.387 s/image). Our results demonstrate that carefully constructed domain-specific datasets and lightweight multimodal fusion can close the domain gap while reducing computational cost in healthcare decision support. Full article
(This article belongs to the Special Issue Transforming Healthcare with Smart Sensing and Machine Learning)
Show Figures

Figure 1

21 pages, 2820 KB  
Article
Research on Small Target Detection Method for Poppy Plants in UAV Aerial Photography Based on Improved YOLOv8
by Xiaodan Feng, Lijun Yun, Chunlong Wang, Haojie Zhang, Rou Guan, Yuying Ma and Huan Jin
Agronomy 2025, 15(12), 2868; https://doi.org/10.3390/agronomy15122868 - 14 Dec 2025
Viewed by 142
Abstract
In response to the challenges in unmanned aerial vehicle (UAV)-based poppy plant detection, such as dense small targets, occlusions, and complex backgrounds, an improved YOLOv8-based detection algorithm with multi-module collaborative optimization is proposed. First, the lightweight Efficient Channel Attention (ECA) mechanism was integrated [...] Read more.
In response to the challenges in unmanned aerial vehicle (UAV)-based poppy plant detection, such as dense small targets, occlusions, and complex backgrounds, an improved YOLOv8-based detection algorithm with multi-module collaborative optimization is proposed. First, the lightweight Efficient Channel Attention (ECA) mechanism was integrated into the YOLOv8 backbone network to construct a composite feature extraction module with enhanced representational capacity. Subsequently, a Bidirectional Feature Pyramid Network (BiFPN) was introduced into the neck network to establish adaptive cross-scale feature fusion through learnable weighting parameters. Furthermore, the Wise Intersection over Union (WIoU) loss function was adopted to enhance the accuracy of bounding box regression. Finally, a dedicated 160 × 160 pixels detection head was added to leverage the high-resolution features from shallow layers, thereby enhancing the detection capability for small targets. Under five-fold cross-validation, the proposed model achieved mAP@0.5 and mAP@0.5:0.95 of 0.989 ± 0.003 and 0.850 ± 0.013, respectively, with average increases of 1.3 and 3.2 percentage points over YOLOv8. Statistical analysis confirmed that these performance gains were significant, demonstrating the effectiveness of the proposed method as a reliable solution for poppy plant detection. Full article
(This article belongs to the Special Issue Agricultural Imagery and Machine Vision)
Show Figures

Figure 1

18 pages, 3768 KB  
Article
DFGNet: A CropLand Change Detection Network Combining Deformable Convolution and Grouped Residual Self-Attention
by Xiangxi Feng and Xiaofang Liu
Appl. Sci. 2025, 15(24), 13133; https://doi.org/10.3390/app152413133 - 14 Dec 2025
Viewed by 57
Abstract
To address the challenges of limited multi-scale feature alignment, excessive feature redundancy, and blurred change boundaries in arable land change detection, this paper proposes an improved model based on the Feature Pyramid Network (FPN). Building upon FPN as the foundational framework, a deformable [...] Read more.
To address the challenges of limited multi-scale feature alignment, excessive feature redundancy, and blurred change boundaries in arable land change detection, this paper proposes an improved model based on the Feature Pyramid Network (FPN). Building upon FPN as the foundational framework, a deformable convolutional network is incorporated into the upsampling path to enhance geometric feature extraction for irregular change regions. Subsequently, the multi-scale feature maps generated by the FPN are processed by a Dynamic Low-Rank Fusion (DLRF) module, which integrates a Grouped Residual Self-Attention mechanism. This mechanism suppresses feature redundancy through low-rank decomposition and performs dynamic, adaptive, cross-scale feature fusion via attention weighting, ultimately producing a binary map of arable land changes. Experiments on public datasets demonstrate that the proposed method outperforms both the original FPN and other mainstream models in key metrics such as mIoU and F1-score, while generating clearer change maps. These results validate the effectiveness of incorporating deformable convolutions and the dynamic low-rank fusion strategy within the FPN framework, providing an effective approach that achieves an mIoU of 57.57% and a change detection F1-score of 72.42% for cultivated land identification. Full article
Show Figures

Figure 1

17 pages, 3453 KB  
Article
Capturing Spatiotemporal Hydraulic Connectivity for Groundwater Level Prediction in Over-Exploited Aquifers: A Multi-Source Fusion Graph Learning Approach (MF-STGCN)
by Rong Liu and Ziyu Guan
Mathematics 2025, 13(24), 3978; https://doi.org/10.3390/math13243978 - 13 Dec 2025
Viewed by 79
Abstract
Accurate prediction of shallow groundwater levels is crucial for water resource management in over-exploited regions like the North China Plain, where intensive pumping has created non-steady flow fields with strong spatial hydraulic interactions. Traditional approaches—whether physical models constrained by parameter equifinality or machine [...] Read more.
Accurate prediction of shallow groundwater levels is crucial for water resource management in over-exploited regions like the North China Plain, where intensive pumping has created non-steady flow fields with strong spatial hydraulic interactions. Traditional approaches—whether physical models constrained by parameter equifinality or machine learning methods assuming spatial independence—fail to explicitly characterize aquifer hydraulic connectivity and effectively integrate multi-source monitoring data. This study proposes a Multi-source Fusion Spatiotemporal Graph Convolutional Network (MF-STGCN) that represents the monitoring well network as a hydraulic connectivity graph, employing graph convolutions to capture spatial water level propagation patterns while integrating temporal dynamics through LSTM modules. An adaptive fusion mechanism quantifies contributions of natural drivers (precipitation, evaporation) and anthropogenic extraction to water level responses. Validation using 518 monitoring stations (2018–2022) demonstrates that MF-STGCN reduces RMSE compared to traditional time series models, with improvement primarily attributed to explicit modeling of spatial hydraulic dependencies. Interpretability analysis identifies Hebi and Shijiazhuang as severe over-exploitation zones and reveals significant response lag effects in the Handan-Xingtai corridor. This study demonstrates that spatial propagation patterns, rather than single-point temporal features, are key to improving prediction accuracy in over-exploited aquifers, providing a new data-driven paradigm for regional groundwater dynamics assessment and targeted management strategies. Full article
Show Figures

Figure 1

58 pages, 8484 KB  
Review
Recent Real-Time Aerial Object Detection Approaches, Performance, Optimization, and Efficient Design Trends for Onboard Performance: A Survey
by Nadin Habash, Ahmad Abu Alqumsan and Tao Zhou
Sensors 2025, 25(24), 7563; https://doi.org/10.3390/s25247563 - 12 Dec 2025
Viewed by 479
Abstract
The rising demand for real-time perception in aerial platforms has intensified the need for lightweight, hardware-efficient object detectors capable of reliable onboard operation. This survey provides a focused examination of real-time aerial object detection, emphasizing algorithms designed for edge devices and UAV onboard [...] Read more.
The rising demand for real-time perception in aerial platforms has intensified the need for lightweight, hardware-efficient object detectors capable of reliable onboard operation. This survey provides a focused examination of real-time aerial object detection, emphasizing algorithms designed for edge devices and UAV onboard processors, where computation, memory, and power resources are severely constrained. We first review the major aerial and remote-sensing datasets and analyze the unique challenges they introduce, such as small objects, fine-grained variation, multiscale variation, and complex backgrounds, which directly shape detector design. Recent studies addressing these challenges are then grouped, covering advances in lightweight backbones, fine-grained feature representation, multi-scale fusion, and optimized Transformer modules adapted for embedded environments. The review further highlights hardware-aware optimization techniques, including quantization, pruning, and TensorRT acceleration, as well as emerging trends in automated NAS tailored to UAV constraints. We discuss the adaptation of large pretrained models, such as CLIP-based embeddings and compressed Transformers, to meet onboard real-time requirements. By unifying architectural strategies, model compression, and deployment-level optimization, this survey offers a comprehensive perspective on designing next-generation detectors that achieve both high accuracy and true real-time performance in aerial applications. Full article
(This article belongs to the Special Issue Image Processing and Analysis in Sensor-Based Object Detection)
Show Figures

Figure 1

20 pages, 1355 KB  
Article
Multimodal Mutual Information Extraction and Source Detection with Application in Focal Seizure Localization
by Soosan Beheshti, Erfan Naghsh, Younes Sadat-Nejad and Yashar Naderahmadian
Electronics 2025, 14(24), 4897; https://doi.org/10.3390/electronics14244897 - 12 Dec 2025
Viewed by 205
Abstract
Current multimodal imaging–based source localization (SoL) methods often rely on synchronously recorded data, and many neural network–driven approaches require large training datasets, conditions rarely met in clinical neuroimaging. To address these limitations, we introduce MieSoL (Multimodal Mutual Information Extraction and Source Localization), a [...] Read more.
Current multimodal imaging–based source localization (SoL) methods often rely on synchronously recorded data, and many neural network–driven approaches require large training datasets, conditions rarely met in clinical neuroimaging. To address these limitations, we introduce MieSoL (Multimodal Mutual Information Extraction and Source Localization), a unified framework that fuses EEG and MRI, whether acquired synchronously or asynchronously, to achieve robust cross-modal information extraction and high-accuracy SoL. Targeting neuroimaging applications, MieSoL combines Magnetic Resonance Imaging (MRI) and Electroencephalography (EEG), leveraging their complementary strengths—MRI’s high spatial resolution and EEG’s superior temporal resolution. MieSoL addresses key limitations of existing SoL methods, including poor localization accuracy and an unreliable estimation of the true source number. The framework combines two existing components—Unified Left Eigenvectors (ULeV) and Efficient High-Resolution sLORETA (EHR-sLORETA)—but integrates them in a novel way: ULeV is adapted to extract a noise-resistant shared latent representation across modalities, enabling cross-modal denoising and an improved estimation of the true source number (TSN), while EHR-sLORETA subsequently performs anatomically constrained high-resolution inverse mapping on the purified subspace. While EHR-sLORETA already demonstrates superior localization precision relative to sLORETA, replacing conventional PCA/ICA preprocessing with ULeV provides substantial advantages, particularly when data are scarce or asynchronously recorded. Unlike PCA/ICA approaches, which perform denoising and source selection separately and are limited in capturing shared information, ULeV jointly processes EEG and MRI to perform denoising, dimension reduction, and mutual-information-based feature extraction in a unified step. This coupling directly addresses longstanding challenges in multimodal SoL, including inconsistent noise levels, temporal misalignment, and the inefficiency of traditional PCA-based preprocessing. Consequently, on synthetic datasets, MieSoL achieves 40% improvement in Average Correlation Coefficient (ACC) and 56% reduction in Average Error Estimation (AEE) compared to conventional techniques. Clinical validation involving 26 epilepsy patients further demonstrates the method’s robustness, with automated results aligning closely with expert epileptologist assessments. Overall, MieSoL offers a principled and interpretable multimodal fusion paradigm that enhances the fidelity of EEG source localization, holding significant promise for both clinical and cognitive neuroscience applications. Full article
Show Figures

Figure 1

36 pages, 7233 KB  
Article
Deep Learning for Tumor Segmentation and Multiclass Classification in Breast Ultrasound Images Using Pretrained Models
by K. E. ArunKumar, Matthew E. Wilson, Nathan E. Blake, Tylor J. Yost and Matthew Walker
Sensors 2025, 25(24), 7557; https://doi.org/10.3390/s25247557 - 12 Dec 2025
Viewed by 204
Abstract
Early detection of breast cancer commonly relies on imaging technologies such as ultrasound, mammography and MRI. Among these, breast ultrasound is widely used by radiologists to identify and assess lesions. In this study, we developed image segmentation techniques and multiclass classification artificial intelligence [...] Read more.
Early detection of breast cancer commonly relies on imaging technologies such as ultrasound, mammography and MRI. Among these, breast ultrasound is widely used by radiologists to identify and assess lesions. In this study, we developed image segmentation techniques and multiclass classification artificial intelligence (AI) tools based on pretrained models to segment lesions and detect breast cancer. The proposed workflow includes both the development of segmentation models and development of a series of classification models to classify ultrasound images as normal, benign or malignant. The pretrained models were trained and evaluated on the Breast Ultrasound Images (BUSI) dataset, a publicly available collection of grayscale breast ultrasound images with corresponding expert-annotated masks. For segmentation, images and ground-truth masks were used to pretrained encoder (ResNet18, EfficientNet-B0 and MobileNetV2)–decoder (U-Net, U-Net++ and DeepLabV3) models, including the DeepLabV3 architecture integrated with a Frequency-Domain Feature Enhancement Module (FEM). The proposed FEM improves spatial and spectral feature representations using Discrete Fourier Transform (DFT), GroupNorm, dropout regularization and adaptive fusion. For classification, each image was assigned a label (normal, benign or malignant). Optuna, an open-source software framework, was used for hyperparameter optimization and for the testing of various pretrained models to determine the best encoder–decoder segmentation architecture. Five different pretrained models (ResNet18, DenseNet121, InceptionV3, MobielNetV3 and GoogleNet) were optimized for multiclass classification. DeepLabV3 outperformed other segmentation architectures, with consistent performance across training, validation and test images, with Dice Similarity Coefficient (DSC, a metric describing the overlap between predicted and true lesion regions) values of 0.87, 0.80 and 0.83 on training, validation and test sets, respectively. ResNet18:DeepLabV3 achieved an Intersection over Union (IoU) score of 0.78 during training, while ResNet18:U-Net++ achieved the best Dice coefficient (0.83) and IoU (0.71) and area under the curve (AUC, 0.91) scores on the test (unseen) dataset when compared to other models. However, the proposed Resnet18: FrequencyAwareDeepLabV3 (FADeepLabV3) achieved a DSC of 0.85 and an IoU of 0.72 on the test dataset, demonstrating improvements over standard DeepLabV3. Notably, the frequency-domain enhancement substantially improved the AUC from 0.90 to 0.98, indicating enhanced prediction confidence and clinical reliability. For classification, ResNet18 produced an F1 score—a measure combining precision and recall—of 0.95 and an accuracy of 0.90 on the training dataset, while InceptionV3 performed best on the test dataset, with an F1 score of 0.75 and accuracy of 0.83. We demonstrate a comprehensive approach to automate the segmentation and multiclass classification of breast cancer ultrasound images into benign, malignant or normal transfer learning models on an imbalanced ultrasound image dataset. Full article
Show Figures

Figure 1

Back to TopTop