Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (319)

Search Parameters:
Keywords = sparse-attention mechanism

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 6489 KB  
Article
LIF-VSR: A Lightweight Framework for Video Super-Resolution with Implicit Alignment and Attentional Fusion
by Songyi Zhang, Hailin Zhang, Xiaolin Wang, Kailei Song, Zhizhuo Han, Zhitao Zhang and Wenchi Cheng
Sensors 2026, 26(2), 637; https://doi.org/10.3390/s26020637 (registering DOI) - 17 Jan 2026
Abstract
Video super-resolution (VSR) has advanced rapidly in enhancing video quality and restoring compressed content, yet leading methods often remain too costly for real-world use. We present LIF-VSR, a lightweight, near-real-time framework built with an efficiency-first philosophy, comprising economical temporal propagation, a new neighboring-frame [...] Read more.
Video super-resolution (VSR) has advanced rapidly in enhancing video quality and restoring compressed content, yet leading methods often remain too costly for real-world use. We present LIF-VSR, a lightweight, near-real-time framework built with an efficiency-first philosophy, comprising economical temporal propagation, a new neighboring-frame fusion strategy, and three streamlined core modules. For temporal propagation, a uni-directional recurrent architecture transfers context through a compact inter-frame memory unit, avoiding the heavy compute and memory of multi-frame parallel inputs. For fusion and alignment, we discard 3D convolutions and optical flow, instead using (i) a deformable convolution module for implicit feature-space alignment, and (ii) a sparse attention fusion module that aggregates adjacent-frame information via learned sparse key sampling points, sidestepping dense global computation. For feature enhancement, a cross-attention mechanism selectively calibrates temporal features at far lower cost than global self-attention. Across public benchmarks, LIF-VSR achieves competitive results with only 3.06 M parameters and a very low computational footprint, reaching 27.65 dB on Vid4 and 31.61 dB on SPMCs. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

24 pages, 11080 KB  
Article
Graph-Based and Multi-Stage Constraints for Hand–Object Reconstruction
by Wenrun Wang, Jianwu Dang, Yangping Wang and Hui Yu
Sensors 2026, 26(2), 535; https://doi.org/10.3390/s26020535 - 13 Jan 2026
Viewed by 116
Abstract
Reconstructing hand and object shapes from a single view during interaction remains challenging due to severe mutual occlusion and the need for high physical plausibility. To address this, we propose a novel framework for hand–object interaction reconstruction based on holistic, multi-stage collaborative optimization. [...] Read more.
Reconstructing hand and object shapes from a single view during interaction remains challenging due to severe mutual occlusion and the need for high physical plausibility. To address this, we propose a novel framework for hand–object interaction reconstruction based on holistic, multi-stage collaborative optimization. Unlike methods that process hands and objects independently or apply constraints as late-stage post-processing, our model progressively enforces physical consistency and geometric accuracy throughout the entire reconstruction pipeline. Our network takes an RGB-D image as input. An adaptive feature fusion module first combines color and depth information to improve robustness against sensing uncertainties. We then introduce structural priors for 2D pose estimation and leverage texture cues to refine depth-based 3D pose initialization. Central to our approach is the iterative application of a dense mutual attention mechanism during sparse-to-dense mesh recovery, which dynamically captures interaction dependencies while refining geometry. Finally, we use a Signed Distance Function (SDF) representation explicitly designed for contact surfaces to prevent interpenetration and ensure physically plausible results. Through comprehensive experiments, our method demonstrates significant improvements on the challenging ObMan and DexYCB benchmarks, outperforming state-of-the-art techniques. Specifically, on the ObMan dataset, our approach achieves hand CDh and object CDo metrics of 0.077 cm2 and 0.483 cm2, respectively. Similarly, on the DexYCB dataset, it attains hand CDh and object CDo values of 0.251 cm2 and 1.127 cm2, respectively. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

18 pages, 998 KB  
Article
A Stock Price Prediction Network That Integrates Multi-Scale Channel Attention Mechanism and Sparse Perturbation Greedy Optimization
by Jiarun He, Fangying Wan and Mingfang He
Algorithms 2026, 19(1), 67; https://doi.org/10.3390/a19010067 - 12 Jan 2026
Viewed by 86
Abstract
The stock market is of paramount importance to economic development. Investors who accurately predict stock price fluctuations based on its high volatility can effectively mitigate investment risks and achieve higher returns. Traditional time series models face limitations when dealing with long sequences and [...] Read more.
The stock market is of paramount importance to economic development. Investors who accurately predict stock price fluctuations based on its high volatility can effectively mitigate investment risks and achieve higher returns. Traditional time series models face limitations when dealing with long sequences and short-term volatility issues, often yielding unsatisfactory predictive outcomes. This paper proposes a novel algorithm, MSNet, which integrates a Multi-scale Channel Attention mechanism (MSCA) and Sparse Perturbation Greedy Optimization (SPGO) onto an xLSTM framework. The MSCA enhances the model’s spatio-temporal information modeling capabilities, effectively preserving key price features within stock data. Meanwhile, SPGO improves the exploration of optimal solutions during training, thereby strengthening the model’s generalization stability against short-term market fluctuations. Experimental results demonstrate that MSNet achieves an MSE of 0.0093 and an MAE of 0.0152 on our proprietary dataset. This approach effectively extracts temporal features from complex stock market data, providing empirical insights and guidance for time series forecasting. Full article
(This article belongs to the Section Algorithms for Multidisciplinary Applications)
Show Figures

Figure 1

27 pages, 6280 KB  
Article
UCA-Net: A Transformer-Based U-Shaped Underwater Enhancement Network with a Compound Attention Mechanism
by Cheng Yu, Jian Zhou, Lin Wang, Guizhen Liu and Zhongjun Ding
Electronics 2026, 15(2), 318; https://doi.org/10.3390/electronics15020318 - 11 Jan 2026
Viewed by 108
Abstract
Images captured underwater frequently suffer from color casts, blurring, and distortion, which are mainly attributable to the unique optical characteristics of water. Although conventional UIE methods rooted in physics are available, their effectiveness is often constrained, particularly in challenging aquatic and illumination conditions. [...] Read more.
Images captured underwater frequently suffer from color casts, blurring, and distortion, which are mainly attributable to the unique optical characteristics of water. Although conventional UIE methods rooted in physics are available, their effectiveness is often constrained, particularly in challenging aquatic and illumination conditions. More recently, deep learning has become a leading paradigm for UIE, recognized for its superior performance and operational efficiency. This paper proposes UCA-Net, a lightweight CNN-Transformer hybrid network. It incorporates multiple attention mechanisms and utilizes composite attention to effectively enhance textures, reduce blur, and correct color. A novel adaptive sparse self-attention module is introduced to jointly restore global color consistency and fine local details. The model employs a U-shaped encoder–decoder architecture with three-stage up- and down-sampling, facilitating multi-scale feature extraction and global context fusion for high-quality enhancement. Experimental results on multiple public datasets demonstrate UCA-Net’s superior performance, achieving a PSNR of 24.75 dB and an SSIM of 0.89 on the UIEB dataset, while maintaining an extremely low computational cost with only 1.44M parameters. Its effectiveness is further validated by improvements in various downstream image tasks. UCA-Net achieves an optimal balance between performance and efficiency, offering a robust and practical solution for underwater vision applications. Full article
Show Figures

Figure 1

20 pages, 15504 KB  
Article
O-Transformer-Mamba: An O-Shaped Transformer-Mamba Framework for Remote Sensing Image Haze Removal
by Xin Guan, Runxu He, Le Wang, Hao Zhou, Yun Liu and Hailing Xiong
Remote Sens. 2026, 18(2), 191; https://doi.org/10.3390/rs18020191 - 6 Jan 2026
Viewed by 164
Abstract
Although Transformer-based and state-space models (e.g., Mamba) have demonstrated impressive performance in image restoration, they remain deficient in remote sensing image dehazing. Transformer-based models tend to distribute attention evenly, making them difficult to handle the uneven distribution of haze. While Mamba excels at [...] Read more.
Although Transformer-based and state-space models (e.g., Mamba) have demonstrated impressive performance in image restoration, they remain deficient in remote sensing image dehazing. Transformer-based models tend to distribute attention evenly, making them difficult to handle the uneven distribution of haze. While Mamba excels at modeling long-range dependencies, it lacks fine-grained spatial awareness of complex atmospheric scattering. To overcome these limitations, we present a new O-shaped dehazing architecture that combines a Sparse-Enhanced Self-Attention (SE-SA) module with a Mixed Visual State Space Model (Mix-VSSM), balancing haze-sensitive details in remote sensing images with long-range context modeling. The SE-SA module introduces a dynamic soft masking mechanism that adaptively adjusts attention weights based on the local haze distribution, enabling the network to more effectively focus on severely degraded regions while suppressing redundant responses. Furthermore, the Mix-VSSM enhances global context modeling by combining sequential processing of 2D perception with local residual information. This design mitigates the loss of spatial detail in the standard VSSM and improves the feature representation of haze-degraded remote sensing images. Thorough experiments verify that our O-shaped framework outperforms existing methods on several benchmark datasets. Full article
(This article belongs to the Special Issue Deep Learning for Remote Sensing Image Enhancement)
Show Figures

Graphical abstract

30 pages, 13588 KB  
Article
MSTFT: Mamba-Based Spatio-Temporal Fusion for Small Object Tracking in UAV Videos
by Kang Sun, Haoyang Zhang and Hui Chen
Electronics 2026, 15(2), 256; https://doi.org/10.3390/electronics15020256 - 6 Jan 2026
Viewed by 145
Abstract
Unmanned Aerial Vehicle (UAV) visual tracking is widely used but continues to face challenges such as unpredictable target motion, error accumulation, and the sparse appearance of small targets. To address these issues, we propose a Mamba-based Spatio-Temporal Fusion Tracker. To address tracking drift [...] Read more.
Unmanned Aerial Vehicle (UAV) visual tracking is widely used but continues to face challenges such as unpredictable target motion, error accumulation, and the sparse appearance of small targets. To address these issues, we propose a Mamba-based Spatio-Temporal Fusion Tracker. To address tracking drift from large displacements and abrupt pose changes, we first introduce a Bidirectional Spatio-Temporal Mamba module. It employs bidirectional spatial scanning to capture discriminative local features and temporal scanning to model dynamic motion patterns. Second, to suppress error accumulation in complex scenes, we develop a Dynamic Template Fusion module with Adaptive Attention. This module integrates a threefold safety verification mechanism—based on response peak, temporal consistency, and motion stability—with a scale-aware strategy to enable robust template updates. Moreover, we design a Small-Target-Aware Context Prediction Head that utilizes a Gaussian-weighted prior to guide feature fusion and refines the loss function, significantly improving localization accuracy under sparse target features and strong background interference. On three major UAV tracking benchmarks (UAV123, UAV123@10fps, and UAV20L), our MSTFT establishes new state-of-the-art with success AUCs of 79.4%, 76.5%, and 75.8% respectively. More importantly, it maintains a tracking speed of 45 FPS, demonstrating a superior balance between precision and efficiency. Full article
Show Figures

Figure 1

28 pages, 4978 KB  
Article
Oilseed Flax Yield Prediction in Arid Gansu, China Using a CNN–Informer Model and Multi-Source Spatio-Temporal Data
by Xingyu Li, Yue Li, Bin Yan, Yuhong Gao, Shunchang Su, Hui Zhou, Lianghe Kang, Huan Liu and Yongbiao Li
Remote Sens. 2026, 18(1), 181; https://doi.org/10.3390/rs18010181 - 5 Jan 2026
Viewed by 279
Abstract
Oilseed flax (Linum usitatissimum, L.) is an important specialty oilseed crop cultivated in arid and semi-arid regions, where timely, accurate yield prediction is crucial for regional oilseed security and agricultural decision-making. To address the lack of robust county-level yield prediction models [...] Read more.
Oilseed flax (Linum usitatissimum, L.) is an important specialty oilseed crop cultivated in arid and semi-arid regions, where timely, accurate yield prediction is crucial for regional oilseed security and agricultural decision-making. To address the lack of robust county-level yield prediction models for oilseed flax, this study proposes a CNN–Informer hybrid framework that integrates convolutional neural networks (CNNs) with the Informer architecture to model multi-source spatio-temporal data. Unlike conventional Transformer-based approaches, the proposed framework combines CNN-based local temporal feature extraction with the ProbSparse attention mechanism of Informer, enabling the efficient modeling of long-range temporal dependencies across multiple years while reducing the computational burden of attention-based time-series modeling. The model incorporates multi-source inputs, including remote sensing indices (NDVI, EVI, SAVI, KNDVI), TerraClimate meteorological variables, soil properties, and historical yield records. Comprehensive experiments conducted at the county level in Gansu Province, China, demonstrate that the CNN–Informer model consistently outperforms representative machine learning and deep learning baselines (Transformer, Informer, LSTM, and XGBoost), achieving an average performance of R2 = 0.82, RMSE = 0.31 t/ha, MAE = 0.21 t/ha, and MAPE = 10.33%. Results from feature ablation and historical yield window analyses reveal that a three-year historical yield window yields optimal performance, with remote sensing features contributing most strongly to predictive accuracy, while meteorological and soil variables enhance spatial adaptability under heterogeneous environmental conditions. Model robustness was further verified through fivefold county-based spatial cross-validation, indicating stable performance and strong generalization capability in unseen regions. Overall, the proposed CNN–Informer framework provides a reliable and interpretable solution for county-level oilseed flax yield prediction and offers practical insights for precision management of specialty crops in arid and semi-arid regions. Full article
Show Figures

Figure 1

19 pages, 2298 KB  
Article
HFSA-Net: A 3D Object Detection Network with Structural Encoding and Attention Enhancement for LiDAR Point Clouds
by Xuehao Yin, Zhen Xiao, Jinju Shao, Zhimin Qiu and Lei Wang
Sensors 2026, 26(1), 338; https://doi.org/10.3390/s26010338 - 5 Jan 2026
Viewed by 311
Abstract
The inherent sparsity of LiDAR point cloud data presents a fundamental challenge for 3D object detection. During the feature encoding stage, especially in voxelization, existing methods find it difficult to effectively retain the critical geometric structural information contained in these sparse point clouds, [...] Read more.
The inherent sparsity of LiDAR point cloud data presents a fundamental challenge for 3D object detection. During the feature encoding stage, especially in voxelization, existing methods find it difficult to effectively retain the critical geometric structural information contained in these sparse point clouds, resulting in decreased detection performance. To address this problem, this paper proposes an enhanced 3D object detection framework. It first designs a Structured Voxel Feature Encoder that significantly enhances the initial feature representation through intra-voxel feature refinement and multi-scale neighborhood context aggregation. Second, it constructs a Hybrid-Domain Attention-Guided Sparse Backbone, which introduces a decoupled hybrid attention mechanism and a hierarchical integration strategy to realize dynamic weighting and focusing on key semantic and geometric features. Finally, a Scale-Aggregation Head is proposed to improve the model’s perception and localization capabilities for different-sized objects via multi-level feature pyramid fusion and cross-layer information interaction. Experimental results on the KITTI dataset show that the proposed algorithm increases the mean Average Precision (mAP) by 3.34% compared to the baseline model. Moreover, experiments on a vehicle platform with a lower-resolution LiDAR verify the effectiveness of the proposed method in improving 3D detection accuracy and its generalization ability. Full article
(This article belongs to the Special Issue Recent Advances in LiDAR Sensing Technology for Autonomous Vehicles)
Show Figures

Figure 1

27 pages, 7513 KB  
Article
Research on Long-Term Structural Response Time-Series Prediction Method Based on the Informer-SEnet Model
by Yufeng Xu, Qingzhong Quan and Zhantao Zhang
Buildings 2026, 16(1), 189; https://doi.org/10.3390/buildings16010189 - 1 Jan 2026
Viewed by 155
Abstract
To address the stochastic, nonlinear, and strongly coupled characteristics of multivariate long-term structural response in bridge health monitoring, this study proposes the Informer-SEnet prediction model. The model integrates a Squeeze-and-Excitation (SE) channel attention mechanism into the Informer framework, enabling adaptive recalibration of channel [...] Read more.
To address the stochastic, nonlinear, and strongly coupled characteristics of multivariate long-term structural response in bridge health monitoring, this study proposes the Informer-SEnet prediction model. The model integrates a Squeeze-and-Excitation (SE) channel attention mechanism into the Informer framework, enabling adaptive recalibration of channel importance to suppress redundant information and enhance key structural response features. A sliding-window strategy is used to construct the datasets, and extensive comparative experiments and ablation studies are conducted on one public bridge-monitoring dataset and two long-term monitoring datasets from real bridges. In the best case, the proposed model achieves improvements of up to 54.67% in MAE, 52.39% in RMSE, and 7.73% in R2. Ablation analysis confirms that the SE module substantially strengthens channel-wise feature representation, while the sparse attention and distillation mechanisms are essential for capturing long-range dependencies and improving computational efficiency. Their combined effect yields the optimal predictive performance. Five-fold cross-validation further evaluates the model’s generalization capability. The results show that Informer-SEnet exhibits smaller fluctuations across folds compared with baseline models, demonstrating higher stability and robustness and confirming the reliability of the proposed approach. The improvement in prediction accuracy enables more precise characterization of the structural response evolution under environmental and operational loads, thereby providing a more reliable basis for anomaly detection and early damage warning, and reducing the risk of false alarms and missed detections. The findings offer an efficient and robust deep learning solution to support bridge structural safety assessment and intelligent maintenance decision-making. Full article
(This article belongs to the Special Issue Recent Developments in Structural Health Monitoring)
Show Figures

Figure 1

27 pages, 7144 KB  
Article
A Time and Frequency Domain Based Dual-Attention Neural Network for Tropical Cyclone Track Prediction
by Fancheng Meng, Xiran Xiong and Liling Zhao
Appl. Sci. 2026, 16(1), 436; https://doi.org/10.3390/app16010436 - 31 Dec 2025
Viewed by 266
Abstract
Due to the influence of various dynamic meteorological factors, accurate Tropical Cyclone (TC) track prediction is a significant challenge. However, current deep learning based time series prediction models fail to simultaneously capture both short-term and long-term dependencies, while also neglecting the change in [...] Read more.
Due to the influence of various dynamic meteorological factors, accurate Tropical Cyclone (TC) track prediction is a significant challenge. However, current deep learning based time series prediction models fail to simultaneously capture both short-term and long-term dependencies, while also neglecting the change in meteorological environment pattern associated with TC motion. This limitation becomes particularly pronounced during sudden turning in the TC track, resulting in significant deterioration of prediction accuracy. To overcome these limitations, we propose LFInformer, a hybrid deep learning framework that integrates an Informer backbone, a Frequency-Enhanced Channel Attention Mechanism (FECAM), and a Long Short-Term Memory (LSTM) network for TC track prediction. The Informer backbone is underpinned by ProbSparse Self-Attention in both the encoder and the causally masked decoder, prioritizing the most informative query–key interactions to deliver robust long-range modeling and sharper detection of turning signals. FECAM enhances meteorological inputs via discrete cosine transforms, band-wise weighting, and channel-wise reweighting, then projects the enhanced signals back into the time domain to produce frequency-aware representations. The LSTM branch captures short-term variations and localized temporal dynamics through its recurrent structure. Together, these components sustain high accuracy during both steady evolution and sudden turnin. Experiments based on the JMA and IBTrACS 1951–2022 Northwest Pacific TC data show that the proposed model achieves an average absolute position error (APE) of 72.39 km, 117.72 km, 145.31 km and 168.64 km for the 6-h, 12-h, 24-h and 48-h forecasting tasks, respectively. The proposed model enhances the accuracy of TC track predictions, offering an innovative approach that optimally balances precision and efficiency in forecasting sudden turning points. Full article
(This article belongs to the Special Issue Advanced Methods for Time Series Forecasting)
Show Figures

Figure 1

20 pages, 6199 KB  
Article
High-Precision Peanut Pod Detection Device Based on Dual-Route Attention Mechanism
by Yongkuai Chen, Pengyan Chang, Tao Wang and Jian Zhao
Appl. Sci. 2026, 16(1), 418; https://doi.org/10.3390/app16010418 - 30 Dec 2025
Viewed by 217
Abstract
Peanut, as an important economic crop, is widely cultivated and rich in nutrients. Classifying peanuts based on the number of seeds helps assess yield and economic value, providing a basis for selection and breeding. However, traditional peanut grading relies on manual labor, which [...] Read more.
Peanut, as an important economic crop, is widely cultivated and rich in nutrients. Classifying peanuts based on the number of seeds helps assess yield and economic value, providing a basis for selection and breeding. However, traditional peanut grading relies on manual labor, which is inefficient and time-consuming. To improve detection efficiency and accuracy, this study proposes an improved BTM-YOLOv8 model and tests it on an independently designed pod detection device. In the backbone network, the BiFormer module is introduced, employing a dual-route attention mechanism with dynamic, content-aware, and query-adaptive sparse attention to extract features from densely packed peanuts. In addition, the Triple Attention mechanism is incorporated to strengthen the model’s multidimensional interaction and feature responsiveness. Finally, the original CIoU loss function is replaced with MPDIoU loss, simplifying distance metric computation and enabling more scale-focused optimization in bounding box regression. The results show that BTM-YOLOv8 has stronger detection performance for ‘Quan Hua 557’ peanut pods, with precision, recall, mAP50, and F1 score reaching 98.40%, 96.20%, 99.00%, and 97.29%, respectively. Compared to the original YOLOv8, these values improved by 3.9%, 2.4%, 1.2%, and 3.14%, respectively. Ablation experiments further validate the effectiveness of the introduced modules, showing reduced attention to irrelevant information, enhanced target feature capture, and lower false detection rates. Through comparisons with various mainstream deep learning models, it was further demonstrated that BTM-YOLOv8 performs well in detecting ‘Quan Hua 557’ peanut pods. When comparing the device’s detection results with manual counts, the R2 value was 0.999, and the RMSE value was 12.69, indicating high accuracy. This study improves the efficiency of ‘Quan Hua 557’ peanut pod detection, reduces labor costs, and provides quantifiable data support for breeding, offering a new technical reference for the detection of other crops. Full article
Show Figures

Figure 1

25 pages, 27652 KB  
Article
A Spike-Inspired Adaptive Spatial Suppression Framework for Large-Scale Landslide Extraction
by Mengjie Gao, Fang Chen, Lei Wang and Bo Yu
Remote Sens. 2026, 18(1), 129; https://doi.org/10.3390/rs18010129 - 30 Dec 2025
Viewed by 177
Abstract
Landslides endanger human safety and damage infrastructure, underscoring the importance of accurate extraction. However, landslide extraction is often hindered by the omission of sparsely distributed landslides and the difficulty of delineating their blurred boundaries. Large-scale landslide extraction faces two key challenges. The first [...] Read more.
Landslides endanger human safety and damage infrastructure, underscoring the importance of accurate extraction. However, landslide extraction is often hindered by the omission of sparsely distributed landslides and the difficulty of delineating their blurred boundaries. Large-scale landslide extraction faces two key challenges. The first is a severe sample imbalance between landslides and background objects, which biases the model toward background and omits landslides. The second is the confusion between landslides and background features, which leads to inaccurate boundary delineation and fragmented extraction results. To address these issues, this paper proposes a two-phase landslide extraction framework. First, we propose a PCA-based landslide candidate extraction module to remove salient background objects and reduce data imbalance. Second, we propose a Spike-inspired Landslide Extraction Model to further discriminate actual landslides from the candidates by incorporating a spike-inspired sparse attention module (SISA). It can enhance weak landslide features such as blurred boundaries while mitigating background noise through its adaptive spatial suppression mechanism. To integrate spatial details across scales, a mix-scale feature aggregation module (MSFA) is proposed, which aggregates hierarchical features to extract landslides of various scales. Experiments on the landslide datasets from the Hengduan Mountains and Hokkaido, Japan, show IoU improvements of 4.26% and 1.22% compared to the recently proposed methods, validating its effectiveness under both imbalanced and dense landslide conditions. Full article
Show Figures

Figure 1

24 pages, 29209 KB  
Article
WSI-GT: Pseudo-Label Guided Graph Transformer for Whole-Slide Histology
by Zhongao Sun, Alexander Khvostikov, Andrey Krylov, Ilya Mikhailov and Pavel Malkov
Mach. Learn. Knowl. Extr. 2026, 8(1), 8; https://doi.org/10.3390/make8010008 - 29 Dec 2025
Viewed by 283
Abstract
Whole-slide histology images (WSIs) can exceed 100 k × 100 k pixels, making direct pixel-level segmentation infeasible and requiring patch-level classification as a practical alternative for downstream WSI segmentation. However, most approaches either treat patches independently, ignoring spatial and biological context, or rely [...] Read more.
Whole-slide histology images (WSIs) can exceed 100 k × 100 k pixels, making direct pixel-level segmentation infeasible and requiring patch-level classification as a practical alternative for downstream WSI segmentation. However, most approaches either treat patches independently, ignoring spatial and biological context, or rely on deep graph models prone to oversmoothing and loss of local tissue detail. We present WSI-GT (Pseudo-Label Guided Graph Transformer), a simple yet effective architecture that addresses these challenges and enables accurate WSI-level tissue segmentation. WSI-GT combines a lightweight local graph convolution block for neighborhood feature aggregation with a pseudo-label guided attention mechanism that preserves intra-class variability and mitigates oversmoothing. To cope with sparse annotations, we introduce an area-weighted sampling strategy that balances class representation while maintaining tissue topology. WSI-GT achieves a Macro F1 of 0.95 on PATH-DT-MSU WSS2v2, improving by up to 3 percentage points over patch-based CNNs and by about 2 points over strong graph baselines. It further generalizes well to the Placenta benchmark and standard graph node classification datasets, highlighting both clinical relevance and broader applicability. These results position WSI-GT as a practical and scalable solution for graph-based learning on extremely large images and for generating clinically meaningful WSI segmentations. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition, 2nd Edition)
Show Figures

Graphical abstract

17 pages, 3051 KB  
Article
Deep Learning Algorithms for Wind Speed Prediction in Complex Terrain Using Meteorological Data
by Donghui Liu, Hao Wang, Jiyong Zhang, Jingguo Lv, Bangzheng He, Chunhui Zhao and Gao Yu
Atmosphere 2026, 17(1), 28; https://doi.org/10.3390/atmos17010028 - 25 Dec 2025
Viewed by 252
Abstract
As core components of power grids, overhead transmission lines must traverse mountains and rivers, particularly in complex terrain where traditional wind speed prediction methods exhibit significant shortcomings in capturing sudden wind speed changes and spatial structural characteristics. The present study proposes a deep [...] Read more.
As core components of power grids, overhead transmission lines must traverse mountains and rivers, particularly in complex terrain where traditional wind speed prediction methods exhibit significant shortcomings in capturing sudden wind speed changes and spatial structural characteristics. The present study proposes a deep learning-based complex terrain wind speed prediction algorithm model utilizing meteorological data with the objective of enhancing the precision of wind speed variation prediction. The model utilizes historical meteorological data and terrain attributes derived from digital elevation models as inputs. The model’s design incorporates a terrain-aware temporal convolutional network and a terrain-modulated initialization strategy, resulting in high sensitivity to wind field variations. Subsequently, a terrain-relative position encoding bridging module is constructed to fuse local terrain features with spatial structural priors. A novel terrain-guided sparse attention mechanism is proposed to direct the model’s focus toward complex terrain regions, thereby enhancing the model’s capacity to predict wind speed with greater precision. The experimental results demonstrate that, for conventional wind speed prediction, this model reduces the mean absolute error and root mean square error by 6.6% and 30%, respectively, compared to current mainstream models. In tasks involving strong wind prediction, the model exhibits a reduction in the average false negative rate and false positive rate by 11.3% and 4.7%, respectively, when compared to conventional models. This finding suggests the model’s efficacy and robustness in complex terrain wind speed prediction tasks. Full article
Show Figures

Figure 1

29 pages, 5902 KB  
Article
MSLCP-DETR: A Multi-Scale Linear Attention and Sparse Fusion Framework for Infrared Small Target Detection in Vehicle-Mounted Systems
by Fu Li, Meimei Zhu, Ming Zhao, Yuxin Sun and Wangyu Wu
Mathematics 2026, 14(1), 67; https://doi.org/10.3390/math14010067 - 24 Dec 2025
Viewed by 227
Abstract
Detecting small infrared targets in vehicle-mounted systems remains challenging due to weak thermal radiation, cross-scale feature loss, and dynamic background interference. To address these issues, this paper proposes MSLCP-DETR, an enhanced RT-DETR-based framework that integrates multi-scale linear attention and sparse fusion mechanisms. The [...] Read more.
Detecting small infrared targets in vehicle-mounted systems remains challenging due to weak thermal radiation, cross-scale feature loss, and dynamic background interference. To address these issues, this paper proposes MSLCP-DETR, an enhanced RT-DETR-based framework that integrates multi-scale linear attention and sparse fusion mechanisms. The model introduces three novel components: a Multi-Scale Linear Attention Encoder (MSLA-AIFI), which combines multi-branch depth-wise convolution with linear attention to efficiently capture cross-scale features while reducing computational complexity; a Cross-Scale Small Object Feature Optimization module (CSOFO), which enhances the localization of small targets in dense scenes through spatial rearrangement and dynamic modeling; and a Pyramid Sparse Transformer (PST), which replaces traditional dense fusion with a dual-branch sparse attention mechanism to improve both accuracy and real-time performance. Extensive experiments on the M3FD and FLIR datasets demonstrate that MSLCP-DETR achieves an excellent balance between accuracy and efficiency, with its precision, mAP@50, and mAP@50:95 reaching 90.3%, 79.5%, and 86.0%, respectively. Ablation studies and visual analysis further validate the effectiveness of the proposed modules and the overall design strategy. Full article
Show Figures

Figure 1

Back to TopTop