Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (11,428)

Search Parameters:
Keywords = feature fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 2704 KB  
Article
VANET-GPSR+: A Lightweight Direction-Aware Routing Protocol for Vehicular Ad Hoc Networks
by Zhuhua Zhang and Ning Ye
Sensors 2026, 26(8), 2525; https://doi.org/10.3390/s26082525 (registering DOI) - 19 Apr 2026
Abstract
Vehicular Ad hoc Networks (VANETs) feature high node mobility and volatile topologies, rendering the conventional Greedy Perimeter Stateless Routing (GPSR) protocol prone to weak link stability and inefficient route discovery due to its lack of direction awareness. Existing direction-aware improvements typically rely on [...] Read more.
Vehicular Ad hoc Networks (VANETs) feature high node mobility and volatile topologies, rendering the conventional Greedy Perimeter Stateless Routing (GPSR) protocol prone to weak link stability and inefficient route discovery due to its lack of direction awareness. Existing direction-aware improvements typically rely on multi-criteria weighting or clustering, introducing heavy parameter fusion and computational overhead that conflict with the resource-constrained nature of onboard units. To overcome these limitations, this paper presents VANET-GPSR+, a lightweight enhanced routing protocol. Its key novelty is that it discards multi-parameter fusion and relies solely on movement direction, supported by a synergistic framework of three lightweight mechanisms: direction-aware neighbor classification to prioritize nodes with consistent trajectories, adaptive greedy forwarding region expansion in sparse and dynamic networks, and path deviation angle-based next-hop selection. This work builds a probabilistic link lifetime model that theoretically quantifies the stability gains of direction awareness—a novel theoretical foundation. Comprehensive urban and highway simulations show that VANET-GPSR+ improves the packet delivery ratio by 16.3% and reduces end-to-end delay by 27.5% compared with standard GPSR, and it outperforms both OP-GPSR and AK-GPSR. It introduces negligible CPU and memory overhead, with CPU usage over 50% lower than the two benchmark protocols at 80 vehicles/km, and demonstrates strong robustness against varying beacon intervals and communication radii. Retaining GPSR’s stateless and distributed traits, VANET-GPSR+ delivers substantial performance gains with minimal overhead, serving as an efficient routing solution for highly dynamic VANETs. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

21 pages, 24433 KB  
Article
A Novel Deep Learning Model for Predicting University English Proficiency Achievement of Students
by Yan Yang, Xiaowei Wang, Mohan Liu, Huiwen Xue and Laixiang Xu
Information 2026, 17(4), 386; https://doi.org/10.3390/info17040386 (registering DOI) - 19 Apr 2026
Abstract
The rapid expansion of English major enrollment has exposed critical limitations in traditional academic assessment methods regarding efficiency and accuracy, constraining educational quality enhancement. This paper introduces an English proficiency assessment approach utilizing an improved RegNet architecture integrated with a dual attention mechanism. [...] Read more.
The rapid expansion of English major enrollment has exposed critical limitations in traditional academic assessment methods regarding efficiency and accuracy, constraining educational quality enhancement. This paper introduces an English proficiency assessment approach utilizing an improved RegNet architecture integrated with a dual attention mechanism. The multidimensional academic data processed by our model include attendance, online participation, language practice, and assessment scores for listening, speaking, reading, and writing from undergraduate English majors. The initial downsampling module of RegNet is optimized through a dual convolutional structure to augment shallow feature extraction. Subsequently, a deformable attention mechanism (DAT) is incorporated to enhance focus on salient features, while a graph attention network (GAT) facilitates interaction and fusion among academic node features. Experimental results demonstrate that the proposed method achieves an average accuracy of 99.46% in proficiency assessment, substantially outperforming mainstream models including EfficientNet and AlexNet. Additionally, it demonstrates robust edge deployment capabilities, providing an effective technical solution for intelligent academic management of English programs within smart campus frameworks. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

39 pages, 49881 KB  
Article
SimTA: A Dual-Polarization SAR Time-Series Rice Field Mapping Model Based on Deep Feature-Level Fusion and Spatiotemporal Attention
by Dong Ren, Jiaxuan Liang, Li Liu, Pengliang Wei, Lingbo Yang, Lu Wang, Hang Sun, Kehan Zhang, Bingwen Qiu, Weiwei Liu and Jingfeng Huang
Remote Sens. 2026, 18(8), 1237; https://doi.org/10.3390/rs18081237 (registering DOI) - 19 Apr 2026
Abstract
Accurate large-scale crop mapping is critical for yield prediction, agricultural disaster monitoring, and global food security. Synthetic aperture radar (SAR), with its all-weather imaging capability, plays a vital role in remote sensing based on crop mapping studies. However, although feature-level fusion has been [...] Read more.
Accurate large-scale crop mapping is critical for yield prediction, agricultural disaster monitoring, and global food security. Synthetic aperture radar (SAR), with its all-weather imaging capability, plays a vital role in remote sensing based on crop mapping studies. However, although feature-level fusion has been widely explored in remote sensing, existing VV and VH fusion approaches for rice mapping are still predominantly conducted at the data level and fail to adequately integrate their complementary information across the rice growth cycle, so the simplistic fusion methods yield features that are redundant or conflicting at field boundaries and in heterogeneous areas, thereby increasing classification errors. To address these challenges, this study proposes a novel spatiotemporal attention model (SimTA) for feature fusion to improve rice mapping. (1) A VV-VH feature-level fusion scheme is designed, integrated with a Content-Guided Attention (CGA) fusion method which effectively exploits the complementary information of the dual-polarized SAR data for achieving deep spatiotemporal dynamics fusion. (2) A Central Difference Convolution Spatial Extraction Conv (CDCSE Conv) Block is designed, enhancing sensitivity to edge variations in rice fields by combining standard and central difference convolutions. (3) To achieve efficient spatiotemporal feature integration across SAR time series, a Temporal–Spatial Attention (TSA) Block is developed, utilizing large-kernel convolutions for spatial feature extraction and a squeeze-and-excitation mechanism for capturing long-range temporal dependencies of rice time series. Extensive experiments were conducted by comparing SimTA with different models under five fusion schemes. Results demonstrate that feature-level fusion consistently outperforms other schemes, with SimTA achieving the best performance: OA = 91.1%, F1 score = 90.9%, and mIoU = 86.2%. Compared to the baseline Simple Video Prediction (SimVP), SimTA improves F1 score and mIoU by 0.8% and 2.1%, respectively. The CGA enhanced feature-level fusion further boosts SimTA’s performance to OA = 91.5% and F1 = 91.4%. SimTA bridges the gap between existing VV-VH deep fusion schemes and modern spatiotemporal modeling demands, offering a more accurate and generalizable approach for large-scale rice field mapping. Full article
36 pages, 5744 KB  
Article
Multi-Scale Atrous Feature Fusion Based on a VGG19-UNet Encoder for Brain Tumor Segmentation
by Shoffan Saifullah and Rafał Dreżewski
Appl. Sci. 2026, 16(8), 3971; https://doi.org/10.3390/app16083971 (registering DOI) - 19 Apr 2026
Abstract
Accurate brain tumor segmentation from magnetic resonance imaging (MRI) remains challenging due to heterogeneous tumor morphology, intensity variability, and multi-scale structural complexity. This study proposes a DeepLabV3+-based segmentation framework integrating a VGG19-UNet encoder, Atrous Spatial Pyramid Pooling (ASPP), and low-level feature refinement to [...] Read more.
Accurate brain tumor segmentation from magnetic resonance imaging (MRI) remains challenging due to heterogeneous tumor morphology, intensity variability, and multi-scale structural complexity. This study proposes a DeepLabV3+-based segmentation framework integrating a VGG19-UNet encoder, Atrous Spatial Pyramid Pooling (ASPP), and low-level feature refinement to simultaneously capture hierarchical semantics and boundary-sensitive spatial details. The architecture enhances receptive field coverage without additional downsampling while preserving fine-grained contour information during reconstruction. Extensive evaluation was conducted on the Figshare Brain Tumor Segmentation (FBTS) dataset and the BraTS 2021 and BraTS 2018 benchmarks, focusing on Whole Tumor segmentation across multiple MRI modalities and tumor grades. Under five-fold cross-validation, the proposed model achieved a mean Dice Similarity Coefficient of 0.9717 and Jaccard Index of 0.9456 on FBTS, with stable and competitive performance across FLAIR, T1, T2, and T1CE modalities in both HGG and LGG cases. Boundary-level analysis further confirmed controlled Hausdorff Distance and low Average Symmetric Surface Distance. Statistical validation and ablation analysis demonstrate consistent improvements over baseline U-Net configurations. The proposed framework provides a robust and computationally efficient solution for automated brain tumor segmentation across heterogeneous datasets. Full article
(This article belongs to the Special Issue Research on Artificial Intelligence in Healthcare)
25 pages, 20117 KB  
Article
Intelligent Corrosion Diagnosis of High-Strength Bolts Based on Multi-Modal Feature Fusion and APO-XGBoost
by Hanyue Zhang, Yin Wu, Bo Sun, Yanyi Liu and Wenbo Liu
Sensors 2026, 26(8), 2520; https://doi.org/10.3390/s26082520 (registering DOI) - 19 Apr 2026
Abstract
High-strength bolts are critical structural components that are highly susceptible to corrosion in complex environments, posing significant threats to structural safety and reliability. Although acoustic emission (AE) technology has been widely applied in structural health monitoring, existing studies mainly focus on damage mode [...] Read more.
High-strength bolts are critical structural components that are highly susceptible to corrosion in complex environments, posing significant threats to structural safety and reliability. Although acoustic emission (AE) technology has been widely applied in structural health monitoring, existing studies mainly focus on damage mode identification or source localization, while the identification of corrosion evolution stages based on AE signals remains insufficient. This study develops an intelligent corrosion diagnosis framework for high-strength bolts by integrating multimodal feature fusion and optimized machine learning. AE signals are first collected from the near-end and far-end of bolts using a wireless sensor network and then transformed into time–frequency representations via continuous wavelet transform (CWT). The resulting time–frequency images are fed into a modified ResNet-18 network to extract deep features, while statistical features are simultaneously extracted from the raw signals to preserve global information. These heterogeneous features are subsequently fused to form a comprehensive representation of corrosion characteristics. Furthermore, an artificial protozoa optimizer (APO) is introduced to adaptively optimize the hyperparameters of the XGBoost model. The results demonstrate that AE signals generated by hammering bolts with different corrosion levels can be successfully distinguished. The proposed method achieves high accuracy in corrosion stage classification and outperforms conventional approaches. Even when evaluated on an additional M30 bolt dataset, the proposed method maintains robust performance, demonstrating excellent generalization capability across different bolt sizes. These results demonstrate the practical potential of the proposed method for intelligent bolt corrosion diagnosis. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

25 pages, 7376 KB  
Article
Adaptive Prompting-Driven Degradation-Aware Fusion for Infrared and Visible Images
by Qian Zhang, Jie Zhou and Hong Liang
Appl. Sci. 2026, 16(8), 3947; https://doi.org/10.3390/app16083947 (registering DOI) - 18 Apr 2026
Abstract
Infrared and visible image fusion aims to combine the complementary advantages of thermal radiation information and rich texture details to generate more informative images for downstream perception tasks. However, existing deep learning-based methods usually assume ideal imaging conditions and often suffer from performance [...] Read more.
Infrared and visible image fusion aims to combine the complementary advantages of thermal radiation information and rich texture details to generate more informative images for downstream perception tasks. However, existing deep learning-based methods usually assume ideal imaging conditions and often suffer from performance degradation in complex environments such as low illumination, rain interference, and strong lighting disturbances. To address this problem, this paper proposes an adaptive prompting-driven degradation-aware fusion framework. Specifically, a degradation-aware prompt generation module is introduced to automatically perceive degradation patterns from the input images and generate structured conditional prompts. These prompts guide the network to adaptively adjust feature representations through learnable affine modulation. Furthermore, a semantic-aligned feature learning strategy is designed to ensure consistent cross-modal representation in the latent space. Extensive experiments demonstrate that the proposed method achieves superior performance compared with several state-of-the-art fusion approaches under both normal and degraded conditions. Full article
32 pages, 1008 KB  
Article
Macro–Market Fusion with Cross-Attention for Equity Return Prediction
by Janit Rajkarnikar, Sibin Joshi and Zhaoxian Zhou
Mathematics 2026, 14(8), 1361; https://doi.org/10.3390/math14081361 (registering DOI) - 18 Apr 2026
Abstract
Macroeconomic conditions are widely believed to influence the direction of equity markets, yet most forecasting models either ignore macroeconomic information or incorporate it through a small set of ad hoc predictors. We propose XAttnFusion, a macro–market fusion architecture that jointly learns from high-frequency [...] Read more.
Macroeconomic conditions are widely believed to influence the direction of equity markets, yet most forecasting models either ignore macroeconomic information or incorporate it through a small set of ad hoc predictors. We propose XAttnFusion, a macro–market fusion architecture that jointly learns from high-frequency market data and lower-frequency macroeconomic time series for equity return prediction. The model comprises three branches: a 1D convolutional network that encodes 40-day market windows (price, volume, and technical indicators), a temporal convolutional network that encodes 24-month macro sequences, and a feedforward branch for volume-at-price structure features. These representations are integrated through multi-head cross-attention, in which the current market state queries the macro sequence to produce a fused representation for directional forecasting. We evaluate XAttnFusion on daily SPY returns from 2012 to 2024 using purged cross-validation with a 5-day embargo to prevent information leakage. To address potential look-ahead bias from macroeconomic publication lags, all macro inputs are lagged by two months. The model achieves a mean out-of-sample AUROC of 0.63±0.05, representing a 27% improvement over random and an 8.1% improvement over the best concatenation baseline. In a fair comparison where each model is independently hyperparameter-tuned, cross-attention fusion improves AUROC by 0.047 over concatenation (p=0.031, Wilcoxon signed-rank test). The model also generalizes to QQQ and IWM, where cross-attention consistently outperforms concatenation fusion. Crucially, the model’s discriminative ability is state-dependent, indicating that the value of macro–market fusion is itself conditioned on market structure. Permutation-based feature importance shows that macro and market branches contribute on a comparable scale (approximately 48% and 36%, respectively), so the gains come from jointly fusing two comparably weighted sources rather than from a single dominant input. Our results show that explicitly modeling macro–market interactions with interpretable attention improves predictive accuracy over naive fusion strategies and provides insight into the time-varying relevance of macroeconomic information in financial forecasting and equity market prediction. Full article
(This article belongs to the Section E5: Financial Mathematics)
27 pages, 8200 KB  
Article
Few-Shot Bearing Fault Diagnosis Based on Multi-Layer Feature Fusion and Similarity Measurement
by Changyong Deng, Dawei Dong, Sipeng Wang, Hongsheng Zhang and Li Feng
Lubricants 2026, 14(4), 172; https://doi.org/10.3390/lubricants14040172 - 17 Apr 2026
Abstract
The running reliability of rolling bearings depends on the effective lubrication state, and poor lubrication will induce abnormal vibration. Therefore, vibration-based fault diagnosis is an important means to evaluate the health of bearings through vibration characteristics. However, the lack of fault samples in [...] Read more.
The running reliability of rolling bearings depends on the effective lubrication state, and poor lubrication will induce abnormal vibration. Therefore, vibration-based fault diagnosis is an important means to evaluate the health of bearings through vibration characteristics. However, the lack of fault samples in actual working conditions seriously restricts the generalization ability and accuracy of an intelligent diagnosis model. A novel few-shot diagnosis method integrating multi-layer feature fusion and adaptive similarity measurement is proposed. This method adopts a meta-learning framework to simulate sample scarcity through numerous N-way K-shot diagnostic tasks. An efficient feature extractor with a cross-task feature stitching mechanism is designed to fuse features from support and query sets. To overcome the limitation of fixed-distance metrics in existing meta-learners, a learnable similarity scheduler adaptively generates optimal pseudo-distance functions. In particular, a multi-layer feature fusion strategy is introduced to compute adaptive similarities at multiple network depths, which significantly enhances feature robustness against operational variations. Experimental results demonstrate the method achieves stable diagnostic accuracy above 90% under extremely few-shot conditions and maintains over 90% accuracy when transferring from laboratory-simulated faults to natural operational faults, validating its strong potential for practical industrial applications where annotated fault data is scarce. Full article
(This article belongs to the Special Issue Advances in Wear Life Prediction of Bearings)
26 pages, 1653 KB  
Article
Hybrid Deep Learning Framework with Cat Swarm Optimization for Cloud-Based Financial Fraud Detection
by Yong Qu and Zengtao Wang
Mathematics 2026, 14(8), 1355; https://doi.org/10.3390/math14081355 - 17 Apr 2026
Abstract
Financial fraud is still one of the most important threats to the financial industry, causing enormous economic losses and mounting difficulties for conventional fraud detection systems. The systems tend to face challenges in dealing with the rising amount of transactional data, the problem [...] Read more.
Financial fraud is still one of the most important threats to the financial industry, causing enormous economic losses and mounting difficulties for conventional fraud detection systems. The systems tend to face challenges in dealing with the rising amount of transactional data, the problem of class imbalance, and the continually changing nature of fraudulent activity. In order to solve these problems, in this research a cloud hybrid framework for detecting fraud using Long Short-Term Memory (LSTM) networks, Autoencoders, and Cat Swarm Optimization (CSO) is suggested. The purpose of the suggested framework is to provide improved detection performance and flexibility on a benchmark financial dataset, with a design intended to support scalability in real-time applications. The framework uses the Credit Card Fraud Detection Dataset from Kaggle, which consists primarily of numerical features, including anonymized variables (V1–V28), along with time and amount. The LSTM networks learn the sequential relationships of transactions, while Autoencoders learn to detect anomalies in the data unsupervised. CSO is used to optimize key hyperparameters of the hybrid model, including the learning rate (0.0001–0.01), batch size (32–128), number of LSTM layers (1–3), number of hidden units per layer (16–128), dropout rate (0.1–0.5), and fusion weights (0–1 for each weight, with the sum constrained to 1) between the LSTM and Autoencoder outputs. In addition, CSO is applied for feature subset selection and threshold tuning to further enhance model performance. Preprocessing is performed on the data, including normalization and feature scaling prior to model training. The suggested framework has a 96.2% accuracy, 94.6% precision, 97.9% recall, 96.2% F1-score, and 0.97 AUC-ROC, showing improved performance compared to CNN-based and LSTM-CNN models under the evaluated conditions. However, since no multiple experiments were conducted to verify the robustness, the results should be interpreted as indicative rather than definitive. The framework exhibits competitive fraud detection performance on the evaluated benchmark dataset, particularly in handling class imbalance. In a simulated environment configured to mimic cloud-like conditions, the framework achieved inference latency between 15 and 30 ms, GPU utilization between 60% and 70%, and a data transfer volume of approximately 1.5 GB per day, suggesting its potential for deployment in cloud-based fraud detection systems. The framework indicates immense potential for cloud deployment, with a robust solution for preventing financial fraud. The proposed framework demonstrates the potential of integrating sequential modeling, anomaly detection, and metaheuristic optimization within a unified and cloud-oriented architecture, providing a more comprehensive approach compared to conventional hybrid models. Full article
32 pages, 8881 KB  
Article
WS-R-IR Adapter: A Multimodal RGB–Infrared Remote Sensing Framework for Water Surface Object Detection
by Bin Xue, Qiang Yu, Kun Ding, Mengxin Jiang, Ying Wang, Shiming Xiang and Chunhong Pan
Remote Sens. 2026, 18(8), 1220; https://doi.org/10.3390/rs18081220 - 17 Apr 2026
Abstract
Water surface object detection in shipborne remote sensing is challenged by unstable wave-induced backgrounds, illumination variations, extreme scale changes with tiny objects, and limited annotations. Multimodal RGB–infrared (RGB–IR) sensing leverages complementary visible and infrared cues to enhance robustness. However, most existing RGB–IR methods [...] Read more.
Water surface object detection in shipborne remote sensing is challenged by unstable wave-induced backgrounds, illumination variations, extreme scale changes with tiny objects, and limited annotations. Multimodal RGB–infrared (RGB–IR) sensing leverages complementary visible and infrared cues to enhance robustness. However, most existing RGB–IR methods rely on backbones pretrained on limited-scale data, which constrain their performance for complex water surface scenes. In this work, we propose the WS-R-IR Adapter, a parameter-efficient vision foundation model (VFM)-based framework for shipborne RGB–IR object detection. Instead of full fine-tuning, it adapts frozen VFM representations via lightweight task-specific designs. the WS-R-IR Adapter includes (1) a water scene domain-aware modal adapter that progressively guides frozen backbone features with evolving semantic cues, (2) a parallel multi-scale structural perception module for fine-grained, scale-sensitive modeling, (3) an adaptive RGB–IR feature modulation fusion strategy, and (4) a resolution-aligned context semantic and structural detail fusion module. Moreover, we introduce an object-guided global-to-local registration framework to address dynamic cross-modal misalignment, and construct modality-aligned PoLaRIS-DET and ASV-RI-DET datasets that cover diverse water surface scenes. On the two datasets, the proposed method achieves mAP@0.5:0.95 scores of 74.2% and 50.2%, respectively, significantly outperforming existing methods with only 11.9M additional parameters. These results demonstrate the effectiveness of parameter-efficient VFM adaptation for multimodal water surface remote sensing. Full article
(This article belongs to the Section Remote Sensing Image Processing)
21 pages, 1194 KB  
Article
Environment-Aware Proactive Beam Prediction in mmWave V2I via Multi-Modal Prior Mask Map
by Changpeng Zhou and Youyun Xu
Sensors 2026, 26(8), 2488; https://doi.org/10.3390/s26082488 - 17 Apr 2026
Abstract
In millimeter wave V2I communication systems, accurate beam prediction is crucial for optimizing network performance and improving signal transmission efficiency. Traditional beam prediction methods mainly rely on single-modal data, which often fails to capture the comprehensive environmental information required for high accuracy prediction. [...] Read more.
In millimeter wave V2I communication systems, accurate beam prediction is crucial for optimizing network performance and improving signal transmission efficiency. Traditional beam prediction methods mainly rely on single-modal data, which often fails to capture the comprehensive environmental information required for high accuracy prediction. In contrast, multi-modal approaches leverage complementary information from different data sources and offer a more promising solution. However, many existing fusion methods primarily depend on real-time sensory inputs and do not fully exploit stable environmental features in V2I scenarios, limiting the effective use of each modality. To address these limitations, this paper proposes a environment-aware proactive beam prediction method based on a multi-modal prior mask map (MMPMM), which integrates offline mapping with an online beam prediction network. Specifically, the method fuses information from images, point clouds, positions, and the MMPMM to predict the optimal beam index. The MMPMM provides channel-related prior information by extracting static V2I scene features offline without incurring any additional online measurement overhead. Experimental results on real-world datasets demonstrate that the proposed method achieves a Top-3 beam prediction accuracy of up to 71.23% while maintaining stable performance under the evaluated dynamic and degraded conditions, demonstrating its effectiveness in the considered scenarios. Full article
(This article belongs to the Special Issue 6G Communication and Edge Intelligence in Wireless Sensor Networks)
Show Figures

Figure 1

34 pages, 8222 KB  
Article
DPF-DETR: Enhancing Drone Image Detection with Density Perception and Multi-Scale Feature Fusion
by Sidi Lai, Zhensong Li, Xiaotan Wei, Yutong Wang and Shiliang Zhu
Remote Sens. 2026, 18(8), 1221; https://doi.org/10.3390/rs18081221 - 17 Apr 2026
Abstract
The DPF-DETR model has been designed to address the challenges encountered in object detection within drone imagery, particularly in scenarios involving significant target scale variations, dense targets, and complex backgrounds. To overcome the limitations of traditional object detection methods, the Density Sensing Mechanism [...] Read more.
The DPF-DETR model has been designed to address the challenges encountered in object detection within drone imagery, particularly in scenarios involving significant target scale variations, dense targets, and complex backgrounds. To overcome the limitations of traditional object detection methods, the Density Sensing Mechanism (DSM) and Adaptive Density Map Loss (AdaptiveDM Loss) have been incorporated into the model to provide fine-grained supervision signals. The DSM optimizes the query selection mechanism by utilizing density maps, enabling the number of queries to be adaptively adjusted based on the distribution density of targets, thus improving detection accuracy in dense regions. Furthermore, the precision of the model in detecting dense targets is enhanced by AdaptiveDM Loss, which dynamically adjusts the weights for object localization and classification. Multi-scale feature fusion capabilities are also improved by the Multi-Scale Feature Fusion Network (MSFFN) and the Selective Feature Integration Module (SFIM). The MSFFN refines the fusion of features, which improves the detection of targets across various scales, particularly in complex scenes. Additionally, SFIM enhances the detection accuracy for small targets and complex backgrounds by integrating low-level spatial features with high-level semantic information. The Context-Sensitive Feature Interaction Module (CSFIM) further optimizes multi-scale feature fusion through context-guided interactions, bridging the semantic gap between features of different scales, thus improving the robustness of the model in dense scenarios. Experimental results have shown that DPF-DETR outperforms traditional models and state-of-the-art detection methods across multiple datasets, demonstrating superior robustness and accuracy, especially in dense target detection and complex background scenarios. Full article
22 pages, 7835 KB  
Article
CMT-BUSNet: Adaptive Fusion-Based Triple-Branch Hybrid Architecture for Explainable Breast Ultrasound Tumor Segmentation
by Hüseyin Kutlu and Cemil Çolak
Diagnostics 2026, 16(8), 1203; https://doi.org/10.3390/diagnostics16081203 - 17 Apr 2026
Abstract
Background/Objectives: This study proposes CMT-BUSNet, a hybrid architecture integrating CNN, Mamba, and Transformer branches for breast ultrasound tumor segmentation with built-in explainability. Methods: CMT-BUSNet employs a CNN-anchored hierarchical parallel encoder where Mamba and Transformer branches process CNN-derived features in parallel, fused through an [...] Read more.
Background/Objectives: This study proposes CMT-BUSNet, a hybrid architecture integrating CNN, Mamba, and Transformer branches for breast ultrasound tumor segmentation with built-in explainability. Methods: CMT-BUSNet employs a CNN-anchored hierarchical parallel encoder where Mamba and Transformer branches process CNN-derived features in parallel, fused through an Adaptive Feature Fusion Module (AFFM) with Dense Nested Decoder and Boundary-Aware Composite Loss. Five-fold cross-validation on BUS-BRA (N = 1875) compared nine architectures under identical protocols, plus nnU-Net v2 trained with its default self-configuring protocol as a benchmark. External evaluation used the BUSI dataset (N = 647). Results: CMT-BUSNet achieved DSC = 0.9037 ± 0.0047 on BUS-BRA with higher boundary delineation metrics than nnU-Net v2, which was trained under a different self-configuring protocol (B-IoU: 0.611 vs. 0.557; HD95: 10.07 vs. 13.54 pixels), despite nnU-Net’s marginally higher DSC (0.9108). On BUSI, CMT-BUSNet (DSC = 0.6709) yielded higher scores than nnU-Net (0.5579) across all metrics under zero-shot transfer, though the two methods were trained under different protocols. Training-based ablation confirmed each component’s contribution, and quantitative XAI validation demonstrated attribution faithfulness (nEAR = 2.82×) and uncertainty–error correlation (r = 0.39). Conclusions: CMT-BUSNet achieves competitive accuracy with higher boundary metrics, preliminary cross-dataset transferability, and built-in interpretability relative to nnU-Net (noting different training protocols). Internal validation folds are image-disjoint but not guaranteed to be patient-disjoint, which should be considered when interpreting the reported metrics. Multicenter validation is required before clinical deployment. Full article
Show Figures

Figure 1

38 pages, 6162 KB  
Article
Leakage-Resistant Multi-Sensor Bearing Fault Diagnosis via Adaptive Time-Frequency Graph Learning and Sensor Reliability-Aware Fusion
by Yu Sun, Yihang Qin, Wenhao Chen, Wenhui Zhao and Haoran Sun
Sensors 2026, 26(8), 2484; https://doi.org/10.3390/s26082484 - 17 Apr 2026
Abstract
Reliable multi-sensor bearing fault diagnosis is challenged by temporal leakage caused by window-level random splitting, limited modeling of cross-sensor dependencies, and inadequate integration of raw temporal dynamics with time-frequency representations. To address these issues, this study proposes a leakage-resistant multi-sensor diagnosis framework that [...] Read more.
Reliable multi-sensor bearing fault diagnosis is challenged by temporal leakage caused by window-level random splitting, limited modeling of cross-sensor dependencies, and inadequate integration of raw temporal dynamics with time-frequency representations. To address these issues, this study proposes a leakage-resistant multi-sensor diagnosis framework that combines a partition-before-windowing evaluation protocol with adaptive time-frequency graph learning and reliability-aware fusion. Continuous vibration records are first divided into disjoint temporal regions with guard intervals and overlap auditing to suppress time-neighbor leakage. The model then extracts complementary features from a raw-signal branch and a dual-resolution log-STFT branch, while adaptive graph learning captures sample-dependent inter-sensor couplings and sensor reliability weighting highlights informative channels. A cross-gated fusion module further integrates temporal and graph-domain representations in a sample-adaptive manner for final classification. Experiments on a reconstructed nine-class benchmark derived from the HUSTbearing dataset show that the proposed method achieves a Macro-Accuracy of 0.973, a Macro-Recall of 0.964, and a Macro-F1 of 0.954, outperforming representative raw-signal and STFT-based baselines under the same leakage-resistant protocol. These results demonstrate that jointly modeling multi-scale time-frequency structure, dynamic sensor relationships, and reliable evaluation yields an effective and interpretable solution for intelligent bearing fault diagnosis under complex operating conditions. Full article
21 pages, 1011 KB  
Article
Daisy-Net: Dual-Attention and Inter-Scale-Aware Yield Network for Lung Nodule Object Detection
by Zhijian Zhu, Yiwen Zhao, Xingang Zhao, Yuhan Ying, Haoran Gu, Guoli Song and Qinghui Wang
Mathematics 2026, 14(8), 1350; https://doi.org/10.3390/math14081350 - 17 Apr 2026
Abstract
Lung nodule detection remains a critical challenge in clinical diagnostics due to the small size, weak contrast, and high background interference of nodules in CT scans. To address these issues, a novel deep neural network architecture, termed Daisy-Net, is proposed. This model incorporates [...] Read more.
Lung nodule detection remains a critical challenge in clinical diagnostics due to the small size, weak contrast, and high background interference of nodules in CT scans. To address these issues, a novel deep neural network architecture, termed Daisy-Net, is proposed. This model incorporates dual attention mechanisms and inter-scale feature perception, consisting of two primary components: the Parallelized Patch and Spatial Context Aware (PPSCA) module and the Omni-domain Multistage Fusion (OMF) module. The PPSCA module enhances the extraction of fine-grained textures and boundary information through multi-branch patch perception and spatial attention. The OMF module employs omni-domain feature fusion and progressive stage-wise supervision to improve robustness and discrimination under complex conditions. The lung nodule detection task is formulated as a two-dimensional segmentation problem and evaluated on the LUNA16 dataset. In the post-binarization comparative evaluation, Daisy-Net achieves the best overall performance among all compared methods, with an Intersection over Union (IoU) of 81.41, a Dice coefficient of 89.75, a precision of 95.34, a sensitivity of 84.78, and a specificity of 99.9974. These findings indicate the model’s strong capability in detecting small pulmonary nodules accurately and reliably. Full article
Back to TopTop