Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,325)

Search Parameters:
Keywords = enhanced channel attention

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 1011 KB  
Article
3D TractFormer: 3D Direct Volumetric White Matter Tract Segmentation with Hybrid Channel-Wise Transformer
by Xiang Gao, Hui Tian, Xuefei Yin and Alan Wee-Chung Liew
Sensors 2026, 26(3), 1068; https://doi.org/10.3390/s26031068 (registering DOI) - 6 Feb 2026
Abstract
Segmenting white matter tracts in diffusion-weighted magnetic resonance imaging (dMRI) is of vital importance for brain health analysis. It remains a challenging task due to the intersection and overlap of tracts (i.e., multiple tracts coexist in one voxel) and the data complexity of [...] Read more.
Segmenting white matter tracts in diffusion-weighted magnetic resonance imaging (dMRI) is of vital importance for brain health analysis. It remains a challenging task due to the intersection and overlap of tracts (i.e., multiple tracts coexist in one voxel) and the data complexity of dMRI images (e.g., 4D high spatial resolution). Existing methods that demonstrate good performance implement direct volumetric tract segmentation by performing on individual 2D slices. However, this ignores 3D contextual information, requires additional post-processing, and struggles with the boundary handling of 3D volumes. Therefore, in this paper, we propose an efficient 3D direct volumetric segmentation method for segmenting white matter tracts. It has three key innovations. First, we propose to deeply interleave convolutions and transformer blocks into a U-shaped network, which effectively integrates their respective strengths to extract spatial contextual features and global long-distance dependencies for enhanced feature extraction. Second, we propose a novel channel-wise transformer, which integrates depth-wise separable convolution and compressed contextual feature-based channel-wise attention, effectively addressing the memory and computational challenges of 4D computing. Moreover, it helps to model global dependencies of contextual features and ensures each hierarchical layer focuses on complementary features. Third, we propose to train a fully symmetric network with gradually sized volumetric patches, which can solve the challenge of few 3D training samples and further reduce memory and computational costs. Experimental results on the largest publicly available tract-specific tractograms dataset demonstrate the superiority of the proposed method over the current state-of-the-art methods. Full article
(This article belongs to the Special Issue Secure AI for Biomedical Sensing and Imaging Applications)
Show Figures

Figure 1

20 pages, 3823 KB  
Article
DA-TransResUNet: Residual U-Net Liver Segmentation Model Integrating Dual Attention of Spatial and Channel with Transformer
by Kunzhan Wang, Xinyue Lu, Jing Li and Yang Lu
Mathematics 2026, 14(3), 575; https://doi.org/10.3390/math14030575 - 5 Feb 2026
Abstract
Precise medical image segmentation plays a vital role in disease diagnosis and clinical treatment. Although U-Net-based architectures and their Transformer-enhanced variants have achieved remarkable progress in automatic segmentation tasks, they still face challenges in complex medical imaging scenarios, particularly around simultaneously modeling fine-grained [...] Read more.
Precise medical image segmentation plays a vital role in disease diagnosis and clinical treatment. Although U-Net-based architectures and their Transformer-enhanced variants have achieved remarkable progress in automatic segmentation tasks, they still face challenges in complex medical imaging scenarios, particularly around simultaneously modeling fine-grained local details and capturing long-range global contextual information, which limits segmentation accuracy and structural consistency. To address these challenges, this paper proposes a novel medical image segmentation framework termed DA-TransResUNet. Built upon a ResUNet backbone, the proposed network integrates residual learning, Transformer-based encoding, and a dual-attention (DA) mechanism in a unified manner. Residual blocks facilitate stable optimization and progressive feature refinement in deep networks, while the Transformer module effectively models long-range dependencies to enhance global context representation. Meanwhile, the proposed DA-Block jointly exploits local and global features as well as spatial and channel-wise dependencies, leading to more discriminative feature representations. Furthermore, embedding DA-Blocks into both the feature embedding stage and skip connections strengthens information interaction between the encoder and decoder, thereby improving overall segmentation performance. Experimental results on the LiTS2017 dataset and Sliver07 dataset demonstrate that the proposed method achieves incremental improvement in liver segmentation. In particular, on the LiTS2017 dataset, DA-TransResUNet achieves a Dice score of 97.39%, a VOE of 5.08%, and an RVD of −0.74%, validating its effectiveness for liver segmentation. Full article
Show Figures

Figure 1

29 pages, 890 KB  
Article
Enhancing Cross-Regional Generalization in UAV Forest Segmentation Across Plantation and Natural Forests with Attention-Refined PP-LiteSeg Networks
by Xinyu Ma, Shuang Zhang, Kaibo Li, Xiaorui Wang, Hong Lin and Zhenping Qiang
Remote Sens. 2026, 18(3), 523; https://doi.org/10.3390/rs18030523 - 5 Feb 2026
Abstract
Accurate fine-scale forest mapping is fundamental for ecological monitoring and resource management. While deep learning semantic segmentation methods have advanced the interpretation of high-resolution UAV imagery, their generalization across diverse forest regions remains challenging due to high spatial heterogeneity. To address this, we [...] Read more.
Accurate fine-scale forest mapping is fundamental for ecological monitoring and resource management. While deep learning semantic segmentation methods have advanced the interpretation of high-resolution UAV imagery, their generalization across diverse forest regions remains challenging due to high spatial heterogeneity. To address this, we propose two enhanced versions based on the PP-LiteSeg architecture for robust cross-regional forest segmentation. Version 01 (V01) integrates a multi-branch attention fusion module composed of parallel channel, spatial, and pixel attention branches. This design enables fine-grained feature enhancement and precise boundary delineation in structurally regular artificial forests, such as the Huayuan Forest Farm. As a result, V01 achieves a mIoU of 92.64% and an F1-score of 96.10%, representing an approximately 18 percentage-point mIoU improvement over PSPNet and DeepLabv3+. Building on this, Version 02 (V02) introduces a lightweight residual connection that directly shortcuts the fused features, thereby improving feature stability and robustness under complex textures and illumination, and demonstrates stronger performance in naturally heterogeneous forests (Longhai Township), attaining an mIoU of 91.87% and an F1-score of 95.77% (5.72 percentage-point mIoU gain over DeepLabv3+). We further conduct comprehensive comparisons against conventional CNN baselines as well as representative lightweight and transformer-based models (BiSeNetV2 and SegFormer-B0). In bidirectional cross-region transfer (train on one region and directly test on the other), V02 exhibits the most stable performance with minimal degradation, highlighting its robustness under domain shift. On a combined cross-regional dataset, V02 achieves a leading mIoU of 91.50%, outperforming U-Net, DeepLabv3+, and PSPNet. In summary, V01 excels in boundary delineation for regular plantation forests, whereas V02 shows more stable generalization across highly varied natural forest landscapes, providing practical solutions for region-adaptive UAV forest segmentation. Full article
(This article belongs to the Special Issue Remote Sensing-Assisted Forest Inventory Planning)
32 pages, 5567 KB  
Article
Optimized Image Segmentation Model for Pellet Microstructure Incorporating KL Divergence Constraints
by Yuwen Ai, Xia Li, Aimin Yang, Yunjie Bai and Xuezhi Wu
Mathematics 2026, 14(3), 574; https://doi.org/10.3390/math14030574 - 5 Feb 2026
Abstract
Accurate segmentation of pellet microstructure images is crucial for evaluating their metallurgical performance and optimizing production processes. To address the challenges posed by complex structures, blurred boundaries, and fine-grained textures of hematite and magnetite in pellet micrographs, this study proposes a hybrid intelligently [...] Read more.
Accurate segmentation of pellet microstructure images is crucial for evaluating their metallurgical performance and optimizing production processes. To address the challenges posed by complex structures, blurred boundaries, and fine-grained textures of hematite and magnetite in pellet micrographs, this study proposes a hybrid intelligently optimized VGG16-U-Net semantic segmentation model. The model incorporates an improved SPC-SA channel self-attention mechanism in the encoder to enhance deep feature representation, while a simplified SAN and SAW module is integrated into the decoder to strengthen its response to key mineral regions. Additionally, a hybrid loss strategy is employed with KL regularization for training optimization. Experimental results show that the model achieves an mIoU of 85.58%, an mPA of 91.54%, and an overall accuracy of 93.58%. Compared with the baseline models, the proposed method achieves improved performance to some extent. Full article
(This article belongs to the Special Issue Mathematical Methods for Image Processing and Computer Vision)
29 pages, 25337 KB  
Article
PTU-Net: A Polarization-Temporal U-Net for Multi-Temporal Sentinel-1 SAR Crop Classification
by Feng Tan, Xikai Fu, Huiming Chai and Xiaolei Lv
Remote Sens. 2026, 18(3), 514; https://doi.org/10.3390/rs18030514 - 5 Feb 2026
Abstract
Accurate crop type mapping remains challenging in regions where persistent cloud cover limits the availability of optical imagery. Multi-temporal dual-polarization Sentinel-1 SAR data offer an all-weather alternative, yet existing approaches often underutilize polarization information and rely on single-scale temporal aggregation. This study proposes [...] Read more.
Accurate crop type mapping remains challenging in regions where persistent cloud cover limits the availability of optical imagery. Multi-temporal dual-polarization Sentinel-1 SAR data offer an all-weather alternative, yet existing approaches often underutilize polarization information and rely on single-scale temporal aggregation. This study proposes PTU-Net, a polarization–temporal U-Net designed specifically for pixel-wise crop segmentation from SAR time series. The model introduces a Polarization Channel Attention module to construct physically meaningful VV/VH combinations and adaptively enhance their contributions. It also incorporates a Multi-Scale Temporal Self-Attention mechanism to model pixel-level backscatter trajectories across multiple spatial resolutions. Using a 12-date Sentinel-1 stack over Kings County, California, and high-quality crop-type reference labels, the model was trained and evaluated under a spatially independent split. Results show that PTU-Net outperforms GRU, ConvLSTM, 3D U-Net, and U-Net–ConvLSTM baselines, achieving the highest overall accuracy and mean IoU among all tested models. Ablation studies confirm that both polarization enhancement and multi-scale temporal modeling contribute substantially to performance gains. These findings demonstrate that integrating polarization-aware feature construction with scale-adaptive temporal reasoning can substantially improve the effectiveness of SAR-based crop mapping, offering a promising direction for operational agricultural monitoring. Full article
34 pages, 4837 KB  
Article
UWB Positioning in Complex Indoor Environments Based on UKF–BiLSTM Bidirectional Mutual Correction
by Yiwei Wang and Zengshou Dong
Electronics 2026, 15(3), 687; https://doi.org/10.3390/electronics15030687 - 5 Feb 2026
Abstract
Non-line-of-sight (NLOS) propagation remains a major obstacle to high-accuracy ultra-wideband (UWB) indoor positioning. To address this issue, this study investigates solutions from two complementary perspectives: NLOS identification and error mitigation. First, an NLOS signal classification model is proposed based on multidimensional statistics of [...] Read more.
Non-line-of-sight (NLOS) propagation remains a major obstacle to high-accuracy ultra-wideband (UWB) indoor positioning. To address this issue, this study investigates solutions from two complementary perspectives: NLOS identification and error mitigation. First, an NLOS signal classification model is proposed based on multidimensional statistics of the channel impulse response (CIR). The model incorporates an attention mechanism and an improved snake optimization (ISO) algorithm, achieving significantly enhanced classification accuracy and robustness. For error mitigation, a UKF–BiLSTM dual-directional mutual calibration framework is proposed to dynamically compensate for NLOS errors. The framework embeds the constant turn rate and velocity (CTRV) motion model within an unscented Kalman filter (UKF) to enhance trajectory modeling. It establishes a bidirectional correction loop with a bidirectional long short-term memory (BiLSTM) network. Through the synergy of physical constraints and data-driven learning, the framework adaptively suppresses NLOS errors. Experimental results show that the proposed framework achieves state-of-the-art–comparable performance with improved model efficiency in complex indoor UWB positioning scenarios. Full article
Show Figures

Figure 1

28 pages, 7334 KB  
Article
I-GhostNetV3: A Lightweight Deep Learning Framework for Vision-Sensor-Based Rice Leaf Disease Detection in Smart Agriculture
by Puyu Zhang, Rui Li, Yuxuan Liu, Guoxi Sun and Chenglin Wen
Sensors 2026, 26(3), 1025; https://doi.org/10.3390/s26031025 - 4 Feb 2026
Abstract
Accurate and timely diagnosis of rice leaf diseases is crucial for smart agriculture leveraging vision sensors. However, existing lightweight convolutional neural networks (CNNs) often struggle in complex field environments, where small lesions, cluttered backgrounds, and varying illumination complicate recognition. This paper presents I-GhostNetV3, [...] Read more.
Accurate and timely diagnosis of rice leaf diseases is crucial for smart agriculture leveraging vision sensors. However, existing lightweight convolutional neural networks (CNNs) often struggle in complex field environments, where small lesions, cluttered backgrounds, and varying illumination complicate recognition. This paper presents I-GhostNetV3, an incrementally improved GhostNetV3-based network for RGB rice leaf disease recognition. I-GhostNetV3 introduces two modular enhancements with controlled overhead: (1) Adaptive Parallel Attention (APA), which integrates edge-guided spatial and channel cues and is selectively inserted to enhance lesion-related representations (at the cost of additional computation), and (2) Fusion Coordinate-Channel Attention (FCCA), a near-neutral SE replacement that enables efficient spatial–channel feature fusion to suppress background interference. Experiments on the Rice Leaf Bacterial and Fungal Disease (RLBF) dataset show that I-GhostNetV3 achieves 90.02% Top-1 accuracy with 1.831 million parameters and 248.694 million FLOPs, outperforming MobileNetV2 and EfficientNet-B0 under our experimental setup while remaining compact relative to the original GhostNetV3. In addition, evaluation on PlantVillage-Corn serves as a supplementary transfer sanity check; further validation on independent real-field target domains and on-device profiling will be explored in future work. These results indicate that I-GhostNetV3 is a promising efficient backbone for future edge deployment in precision agriculture. Full article
Show Figures

Figure 1

14 pages, 884 KB  
Article
Lipid Peroxidation Products 4-ONE and 4-HNE Modulate Voltage-Gated Sodium Channels in Neuronal Cell Lines and DRG Action Potentials
by Ming-Zhe Yin, Na Kyeong Park, Mi Seon Seo, Jin Ryeol An, Hyun Jong Kim, JooHan Woo, Jintae Kim, Min Yan, Sung Joon Kim and Seong Woo Choi
Antioxidants 2026, 15(2), 206; https://doi.org/10.3390/antiox15020206 - 4 Feb 2026
Abstract
Oxidative stress-induced lipid peroxidation products (LPPs), particularly 4-hydroxy-nonenal (4-HNE) and 4-oxo-nonenal (4-ONE), have recently gained attention for their direct regulation of ion channels essential for pain signaling. In this study, we investigated how these two LPPs affect the electrophysiological properties of neurons, specifically [...] Read more.
Oxidative stress-induced lipid peroxidation products (LPPs), particularly 4-hydroxy-nonenal (4-HNE) and 4-oxo-nonenal (4-ONE), have recently gained attention for their direct regulation of ion channels essential for pain signaling. In this study, we investigated how these two LPPs affect the electrophysiological properties of neurons, specifically voltage-gated sodium (NaV) channels, thereby influencing sensory neuron excitability and pain pathways. Using human neuroblastoma (SH-SY5Y) and ND7/23 cells (a fusion cell line exhibiting partial sensory neuron properties), we measured changes in NaV channel-mediated sodium currents following treatment with 4-HNE or 4-ONE. Whole-cell patch-clamp experiments showed that 4-ONE (10 µM) and 4-HNE (100 µM) did not significantly alter the peak sodium current amplitude in SH-SY5Y cells. However, in ND7/23 cells, both 4-HNE and 4-ONE induced a negative shift in NaV channel activation voltage dependence, enabling sodium channel activation at lower membrane potentials. Furthermore, current-clamp recordings in primary mouse dorsal root ganglion neurons demonstrated that treatment with 4-ONE and 4-HNE reduced the current threshold required to elicit action potentials and significantly increased action potential firing frequency. These findings indicate that LPPs enhance pain sensitivity by modulating NaV channels, which play a crucial role in pain transmission. In conclusion, 4-HNE and 4-ONE shift the voltage-dependent activation of sodium channels toward more negative potentials, thereby increasing the excitability of primary sensory neurons and amplifying pain signals. This study provides molecular insights into how oxidative stress-related lipid peroxidation contributes to sensory mechanisms and offers potential avenues for developing new treatments for oxidative stress- or inflammation-associated pain. Full article
(This article belongs to the Special Issue Lipid Peroxidation in Physiology and Chronic Inflammatory Diseases)
Show Figures

Figure 1

19 pages, 1576 KB  
Article
LGH-YOLOv12n: Latent Diffusion Inpainting Data Augmentation and Improved YOLOv12n Model for Rice Leaf Disease Detection
by Shaowei Mi, Cheng Li, Kui Fang, Xinghui Zhu and Gang Chen
Agriculture 2026, 16(3), 368; https://doi.org/10.3390/agriculture16030368 - 4 Feb 2026
Viewed by 18
Abstract
Detecting rice leaf diseases in real-world field environments remains challenging due to varying lesion sizes, diverse lesion morphologies, complex backgrounds, and the limited availability of high-quality annotated datasets. Existing detection models often suffer from performance degradation under these conditions, particularly when training data [...] Read more.
Detecting rice leaf diseases in real-world field environments remains challenging due to varying lesion sizes, diverse lesion morphologies, complex backgrounds, and the limited availability of high-quality annotated datasets. Existing detection models often suffer from performance degradation under these conditions, particularly when training data lack sufficient diversity and structural realism. To address these challenges, this paper proposes a Latent Diffusion Inpainting (LDI) data augmentation method and an improved lightweight detection model, LGH-YOLOv12n. Unlike conventional diffusion-based augmentation methods that generate full images or random patches, LDI performs category-aware latent inpainting, synthesizing realistic lesion patterns by jointly conditioning on background context and disease categories, thereby enhancing data diversity while preserving scene consistency. Furthermore, LGH-YOLOv12n improves upon the YOLOv12n baseline by introducing GSConv in the backbone to reduce channel redundancy and enhance lesion localization, and integrating Hierarchical Multi-head Attention (HMHA) into the neck network to better distinguish disease features from complex field backgrounds. Experimental results demonstrate that LGH-YOLOv12n achieves an F1 of 86.1% and an mAP@50 of 88.3%, outperforming the YOLOv12n model trained without data augmentation by 3.3% and 5.0%, respectively. Moreover, when trained on the LDI-augmented dataset, LGH-YOLOv12n consistently outperforms YOLOv8n, YOLOv10n, YOLOv11n, and YOLOv12n, with mAP@50 improvements of 4.6%, 5.2%, 1.9%, and 2.1%, respectively. These results indicate that the proposed LDI augmentation and LGH-YOLOv12n model provide an effective and robust solution for rice leaf disease detection in complex field environments. Full article
(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)
27 pages, 8533 KB  
Article
An Application Study on Digital Image Classification and Recognition of Yunnan Jiama Based on a YOLO-GAM Deep Learning Framework
by Nan Ji, Fei Ju and Qiang Wang
Appl. Sci. 2026, 16(3), 1551; https://doi.org/10.3390/app16031551 - 3 Feb 2026
Viewed by 97
Abstract
Yunnan Jiama (paper horse prints), a representative form of intangible cultural heritage in southwest China, is characterized by subtle inter-class differences, complex woodblock textures, and heterogeneous preservation conditions, which collectively pose significant challenges for digital preservation and automatic image classification. To address these [...] Read more.
Yunnan Jiama (paper horse prints), a representative form of intangible cultural heritage in southwest China, is characterized by subtle inter-class differences, complex woodblock textures, and heterogeneous preservation conditions, which collectively pose significant challenges for digital preservation and automatic image classification. To address these challenges and improve the computational analysis of Jiama images, this study proposes an enhanced object detection framework based on YOLOv8 integrated with a Global Attention Mechanism (GAM), referred to as YOLOv8-GAM. In the proposed framework, the GAM module is embedded into the high-level semantic feature extraction and multi-scale feature fusion stages of YOLOv8, thereby strengthening global channel–spatial interactions and improving the representation of discriminative cultural visual features. In addition, image augmentation strategies, including brightness adjustment, salt-and-pepper noise, and Gaussian noise, are employed to simulate real-world image acquisition and degradation conditions, which enhances the robustness of the model. Experiments conducted on a manually annotated Yunnan Jiama image dataset demonstrate that the proposed model achieves a mean average precision (mAP) of 96.5% at an IoU threshold of 0.5 and 82.13% under the mAP@0.5:0.95 metric, with an F1-score of 94.0%, outperforming the baseline YOLOv8 model. These results indicate that incorporating global attention mechanisms into object detection networks can effectively enhance fine-grained classification performance for traditional folk print images, thereby providing a practical and scalable technical solution for the digital preservation and computational analysis of intangible cultural heritage. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
28 pages, 4721 KB  
Article
MAF-RecNet: A Lightweight Wheat and Corn Recognition Model Integrating Multiple Attention Mechanisms
by Hao Yao, Ji Zhu, Yancang Li, Haiming Yan, Wenzhao Feng, Luwang Niu and Ziqi Wu
Remote Sens. 2026, 18(3), 497; https://doi.org/10.3390/rs18030497 - 3 Feb 2026
Viewed by 109
Abstract
This study is grounded in the macro-context of smart agriculture and global food security. Due to population growth and climate change, precise and efficient monitoring of crop distribution and growth is vital for stable production and optimal resource use. Remote sensing combined with [...] Read more.
This study is grounded in the macro-context of smart agriculture and global food security. Due to population growth and climate change, precise and efficient monitoring of crop distribution and growth is vital for stable production and optimal resource use. Remote sensing combined with deep learning enables multi-scale agricultural monitoring from field identification to disease diagnosis. However, current models face three deployment bottlenecks: high complexity hinders operation on edge devices; scarce labeled data causes overfitting in small-sample cases; and there is insufficient generalization across regions, crops, and imaging conditions. These issues limit the large-scale adoption of intelligent agricultural technologies. To tackle them, this paper proposes a lightweight crop recognition model, MAF-RecNet. It aims to achieve high accuracy, efficiency, and strong generalization with limited data through structural optimization and attention mechanism fusion, offering a viable path for deployable intelligent monitoring systems. Built on a U-Net with a pre-trained ResNet18 backbone, MAF-RecNet integrates multiple attention mechanisms (Coordinate, External, Pyramid Split, and Efficient Channel Attention) into a hybrid attention module, improving multi-scale feature discrimination. On the Southern Hebei Farmland dataset, it achieves 87.57% mIoU and 95.42% mAP, outperforming models like SegNeXt and FastSAM, while maintaining a balance of efficiency (15.25 M parameters, 21.81 GFLOPs). The model also shows strong cross-task generalization, with mIoU scores of 80.56% (Wheat Health Status Dataset in Southern Hebei), 90.20% (Global Wheat Health Dataset), and 84.07% (Corn Health Status Dataset). Ablation studies confirm the contribution of the attention-enhanced skip connections and decoder. This study not only provides an efficient and lightweight solution for few-shot agricultural image recognition but also offers valuable insights into the design of generalizable models for complex farmland environments. It contributes to promoting the scalable and practical application of artificial intelligence technologies in precision agriculture. Full article
Show Figures

Figure 1

19 pages, 3447 KB  
Article
Hybrid Decoding with Co-Occurrence Awareness for Fine-Grained Food Image Segmentation
by Shenglong Wang and Guorui Sheng
Foods 2026, 15(3), 534; https://doi.org/10.3390/foods15030534 - 3 Feb 2026
Viewed by 98
Abstract
Fine-grained food image segmentation is essential for accurate dietary assessment and nutritional analysis, yet remains highly challenging due to ambiguous boundaries, inter-class similarity, and dense layouts of meals containing many different ingredients in real-world settings. Existing methods based solely on CNNs, Transformers, or [...] Read more.
Fine-grained food image segmentation is essential for accurate dietary assessment and nutritional analysis, yet remains highly challenging due to ambiguous boundaries, inter-class similarity, and dense layouts of meals containing many different ingredients in real-world settings. Existing methods based solely on CNNs, Transformers, or Mamba architectures often fail to simultaneously preserve fine-grained local details and capture contextual dependencies over long distances. To address these limitations, we propose HDF (Hybrid Decoder for Food Image Segmentation), a novel decoding framework built upon the MambaVision backbone. Our approach first employs a convolution-based feature pyramid network (FPN) to extract multi-stage features from the encoder. These features are then thoroughly fused across scales using a Cross-Layer Mamba module that models inter-level dependencies with linear complexity. Subsequently, an Attention Refinement module integrates global semantic context through spatial–channel reweighting. Finally, a Food Co-occurrence Module explicitly enhances food-specific semantics by learning dynamic co-occurrence patterns among categories, improving segmentation of visually similar or frequently co-occurring ingredients. Evaluated on two widely used, high-quality benchmarks, FoodSeg103 and UEC-FoodPIX Complete, which are standard datasets for fine-grained food segmentation, HDF achieves a 52.25% mean Intersection-over-Union (mIoU) on FoodSeg103 and a 76.16% mIoU on UEC-FoodPIX Complete, outperforming current state-of-the-art methods by a clear margin. These results demonstrate that HDF’s hybrid design and explicit co-occurrence awareness effectively address key challenges in food image segmentation, providing a robust foundation for practical applications in dietary logging, nutritional estimation, and food safety inspection. Full article
(This article belongs to the Section Food Analytical Methods)
Show Figures

Figure 1

21 pages, 2928 KB  
Article
No Trade-Offs: Unified Global, Local, and Multi-Scale Context Modeling for Building Pixel-Wise Segmentation
by Zhiyu Zhang, Debao Yuan, Yifei Zhou and Renxu Yang
Remote Sens. 2026, 18(3), 472; https://doi.org/10.3390/rs18030472 - 2 Feb 2026
Viewed by 89
Abstract
Building extraction from remote sensing imagery plays a pivotal role in applications such as smart cities, urban planning, and disaster assessment. Although deep learning has significantly advanced this task, existing methods still struggle to strike an effective balance among global semantic understanding, local [...] Read more.
Building extraction from remote sensing imagery plays a pivotal role in applications such as smart cities, urban planning, and disaster assessment. Although deep learning has significantly advanced this task, existing methods still struggle to strike an effective balance among global semantic understanding, local detail recovery, and multi-scale contextual awareness—particularly when confronted with challenges including extreme scale variations, complex spatial distributions, occlusions, and ambiguous boundaries. To address these issues, we propose TriadFlow-Net, an efficient end-to-end network architecture. First, we introduce the Multi-scale Attention Feature Enhancement Module (MAFEM), which employs parallel attention branches with varying neighborhood radii to adaptively capture multi-scale contextual information, thereby alleviating the problem of imbalanced receptive field coverage. Second, to enhance robustness under severe occlusion scenarios, we innovatively integrate a Non-Causal State Space Model (NC-SSD) with a Densely Connected Dynamic Fusion (DCDF) mechanism, enabling linear-complexity modeling of global long-range dependencies. Finally, we incorporate a Multi-scale High-Frequency Detail Extractor (MHFE) along with a channel–spatial attention mechanism to precisely refine boundary details while suppressing noise. Extensive experiments conducted on three publicly available building segmentation benchmarks demonstrate that the proposed TriadFlow-Net achieves state-of-the-art performance across multiple evaluation metrics, while maintaining computational efficiency—offering a novel and effective solution for high-resolution remote sensing building extraction. Full article
Show Figures

Figure 1

31 pages, 4720 KB  
Article
SE-MTCAELoc: SE-Aided Multi-Task Convolutional Autoencoder for Indoor Localization with Wi-Fi
by Yongfeng Li, Juan Huang, Yuan Yao and Binghua Su
Sensors 2026, 26(3), 945; https://doi.org/10.3390/s26030945 - 2 Feb 2026
Viewed by 137
Abstract
Indoor localization finds wide-ranging applications in user navigation and intelligent building systems. Nevertheless, signal interference within complex indoor environments and challenges regarding localization generalization in multi-building and multi-floor scenarios have restricted the performance of traditional localization methods based on Wi-Fi fingerprinting. To tackle [...] Read more.
Indoor localization finds wide-ranging applications in user navigation and intelligent building systems. Nevertheless, signal interference within complex indoor environments and challenges regarding localization generalization in multi-building and multi-floor scenarios have restricted the performance of traditional localization methods based on Wi-Fi fingerprinting. To tackle these issues, this paper presents the SE-MTCAELoc model, a multi-task convolutional autoencoder approach that integrates a squeeze-excitation (SE) attention mechanism for indoor positioning. Firstly, the method preprocesses Wi-Fi Received Signal Strength (RSSI) data. In the UJIIndoorLoc dataset, the 520-dimensional RSSI features are extended to 576 dimensions and reshaped into a 24 × 24 matrix. Meanwhile, Gaussian noise is introduced to enhance the robustness of the data. Subsequently, an integrated SE module combined with a convolutional autoencoder (CAE) is constructed. This module aggregates channel spatial information through squeezing operations and learns channel weights via excitation operations. It dynamically enhances key positioning features and suppresses noise. Finally, a multi-task learning architecture based on the SE-CAE encoder is established to jointly optimize building classification, floor classification, and coordinate regression tasks. Priority balancing is achieved using weighted losses (0.1 for building classification, 0.2 for floor classification, and 0.7 for coordinate regression). Experimental results on the UJIIndoorLoc dataset indicate that the accuracy of building classification reaches 99.57%, the accuracy of floor classification is 98.57%, and the mean absolute error (MAE) for coordinate regression is 5.23 m. Furthermore, the model demonstrates exceptional time efficiency. The cumulative training duration (including SE-CAE pre-training) is merely 9.83 min, with single-sample inference taking only 0.347 milliseconds, fully meeting the requirements of real-time indoor localization applications. On the TUT2018 dataset, the floor classification accuracy attains 98.13%, with an MAE of 6.16 m. These results suggest that the SE-MTCAELoc model can effectively enhance the localization accuracy and generalization ability in complex indoor scenarios and meet the localization requirements of multiple scenarios. Full article
(This article belongs to the Section Communications)
Show Figures

Figure 1

13 pages, 1659 KB  
Article
Image Feature Fusion of Hyperspectral Imaging and MRI for Automated Subtype Classification and Grading of Adult Diffuse Gliomas According to the 2021 WHO Criteria
by Ya Su, Jiazheng Sun, Rongxin Fu, Xiaoran Li, Jie Bai, Fengqi Li, Hongwei Yang, Ye Cheng and Jie Lu
Diagnostics 2026, 16(3), 458; https://doi.org/10.3390/diagnostics16030458 - 1 Feb 2026
Viewed by 227
Abstract
Background: Current histopathology- and molecular-based gold standards for diagnosing adult diffuse gliomas (ADGs) have inherent limitations in reproducibility and interobserver concordance, while being time-intensive and resource-demanding. Although hyperspectral imaging (HSI)-based computer-aided pathology shows potential for automated diagnosis, it often yields suboptimal accuracy due [...] Read more.
Background: Current histopathology- and molecular-based gold standards for diagnosing adult diffuse gliomas (ADGs) have inherent limitations in reproducibility and interobserver concordance, while being time-intensive and resource-demanding. Although hyperspectral imaging (HSI)-based computer-aided pathology shows potential for automated diagnosis, it often yields suboptimal accuracy due to the lack of complementary spatial and structural tumor information. This study introduces a multimodal fusion framework integrating HSI with routinely acquired preoperative magnetic resonance imaging (MRI) to enable automated, high-precision ADG diagnosis. Methods: We developed the Hyperspectral Attention Fusion Network (HAFNet), incorporating residual learning and channel attention to jointly capture HSI patterns and MRI-derived radiomic features. The dataset comprised 1931 HSI cubes (400–1000 nm, 300 spectral bands) from histopathological patches of six major World Health Organization (WHO)-defined glioma subtypes in 30 patients, together with their routinely acquired preoperative MRI sequences. Informative wavelengths were selected using mutual information. Radiomic features were extracted with the PyRadiomics package. Model performance was assessed via stratified 5-fold cross-validation, with accuracy and area under the curve (AUC) as primary endpoints. Results: The multimodal HAFNet achieved a macro-averaged AUC of 0.9886 and a classification accuracy of 98.66%, markedly outperforming the HSI-only baseline (AUC 0.9267, accuracy 87.25%; p < 0.001), highlighting the complementary value of MRI-derived radiomic features in enhancing discrimination beyond spectral information. Conclusions: Integrating HSI biochemical and microstructural insights with MRI radiomics of morphology and context, HAFNet provides a robust, reproducible, and efficient framework for accurately predicting 2021 WHO types and grades of ADGs, demonstrating the significant added value of multimodal integration for precise glioma diagnosis. Full article
Show Figures

Figure 1

Back to TopTop