Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (395)

Search Parameters:
Keywords = two-channel feature fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 3922 KB  
Article
Hierarchical Multiscale Fusion with Coordinate Attention for Lithologic Mapping from Remote Sensing
by Fuyuan Xie and Yongguo Yang
Remote Sens. 2026, 18(3), 413; https://doi.org/10.3390/rs18030413 - 26 Jan 2026
Abstract
Accurate lithologic maps derived from satellite imagery underpin structural interpretation, mineral exploration, and geohazard assessment. However, automated mapping in complex terranes remains challenging because spectrally similar units, narrow anisotropic bodies, and ambiguous contacts can degrade boundary fidelity. In this study, we propose SegNeXt-HFCA, [...] Read more.
Accurate lithologic maps derived from satellite imagery underpin structural interpretation, mineral exploration, and geohazard assessment. However, automated mapping in complex terranes remains challenging because spectrally similar units, narrow anisotropic bodies, and ambiguous contacts can degrade boundary fidelity. In this study, we propose SegNeXt-HFCA, a hierarchical multiscale fusion network with coordinate attention for lithologic segmentation from a Sentinel-2/DEM feature stack. The model builds on SegNeXt and introduces a hierarchical multiscale encoder with coordinate attention to jointly capture fine textures and scene-level structure. It further adopts a class-frequency-aware hybrid loss that combines boundary-weighted online hard-example mining cross-entropy with Lovász-Softmax to better handle long-tailed classes and ambiguous contacts. In addition, we employ a robust training and inference scheme, including entropy-guided patch sampling, exponential moving average of parameters, test-time augmentation, and a DenseCRF-based post-refinement. Two study areas in the Beishan orogen, northwestern China (Huitongshan and Xingxingxia), are used to evaluate the method with a unified 10-channel Sentinel-2/DEM feature stack. Compared with U-NetFormer, PSPNet, DeepLabV3+, DANet, LGMSFNet, SegFormer, BiSeNetV2, and the SegNeXt backbone, SegNeXt-HFCA improves mean intersection-over-union (mIoU) by about 3.8% in Huitongshan and 2.6% in Xingxingxia, respectively, and increases mean pixel accuracy by approximately 3–4%. Qualitative analyses show that the proposed framework better preserves thin-unit continuity, clarifies lithologic contacts, and reduces salt-and-pepper noise, yielding geologically more plausible maps. These results demonstrate that hierarchical multiscale fusion with coordinate attention, together with class- and boundary-aware optimization, provides a practical route to robust lithologic mapping in structurally complex regions. Full article
(This article belongs to the Section Remote Sensing for Geospatial Science)
Show Figures

Figure 1

30 pages, 8651 KB  
Article
Disease-Seg: A Lightweight and Real-Time Segmentation Framework for Fruit Leaf Diseases
by Liying Cao, Donghui Jiang, Yunxi Wang, Jiankun Cao, Zhihan Liu, Jiaru Li, Xiuli Si and Wen Du
Agronomy 2026, 16(3), 311; https://doi.org/10.3390/agronomy16030311 - 26 Jan 2026
Abstract
Accurate segmentation of fruit tree leaf diseases is critical for yield protection and precision crop management, yet it is challenging due to complex field conditions, irregular leaf morphology, and diverse lesion patterns. To address these issues, Disease-Seg, a lightweight real-time segmentation framework, is [...] Read more.
Accurate segmentation of fruit tree leaf diseases is critical for yield protection and precision crop management, yet it is challenging due to complex field conditions, irregular leaf morphology, and diverse lesion patterns. To address these issues, Disease-Seg, a lightweight real-time segmentation framework, is proposed. It integrates CNN and Transformer with a parallel fusion architecture to capture local texture and global semantic context. The Extended Feature Module (EFM) enlarges the receptive field while retaining fine details. A Deep Multi-scale Attention mechanism (DM-Attention) allocates channel weights across scales to reduce redundancy, and a Feature-weighted Fusion Module (FWFM) optimizes integration of heterogeneous feature maps, enhancing multi-scale representation. Experiments show that Disease-Seg achieves 90.32% mIoU and 99.52% accuracy, outperforming representative CNN, Transformer, and hybrid-based methods. Compared with HRNetV2, it improves mIoU by 6.87% and FPS by 31, while using only 4.78 M parameters. It maintains 69 FPS on 512 × 512 crops and requires approximately 49 ms per image on edge devices, demonstrating strong deployment feasibility. On two grape leaf diseases from the PlantVillage dataset, it achieves 91.19% mIoU, confirming robust generalization. These results indicate that Disease-Seg provides an accurate, efficient, and practical solution for fruit leaf disease segmentation, enabling real-time monitoring and smart agriculture applications. Full article
Show Figures

Figure 1

24 pages, 10940 KB  
Article
A Few-Shot Object Detection Framework for Remote Sensing Images Based on Adaptive Decision Boundary and Multi-Scale Feature Enhancement
by Lijiale Yang, Bangjie Li, Dongdong Guan and Deliang Xiang
Remote Sens. 2026, 18(3), 388; https://doi.org/10.3390/rs18030388 - 23 Jan 2026
Viewed by 119
Abstract
Given the high cost of acquiring large-scale annotated datasets, few-shot object detection (FSOD) has emerged as an increasingly important research direction. However, existing FSOD methods face two critical challenges in remote sensing images (RSIs): (1) features of small targets within remote sensing images [...] Read more.
Given the high cost of acquiring large-scale annotated datasets, few-shot object detection (FSOD) has emerged as an increasingly important research direction. However, existing FSOD methods face two critical challenges in remote sensing images (RSIs): (1) features of small targets within remote sensing images are incompletely represented due to extremely small-scale and cluttered backgrounds, which weakens discriminability and leads to significant detection degradation; (2) unified classification boundaries fail to handle the distinct confidence distributions between well-sampled base classes and sparsely sampled novel classes, leading to ineffective knowledge transfer. To address these issues, we propose TS-FSOD, a Transfer-Stable FSOD framework with two key innovations. First, the proposed detector integrates a Feature Enhancement Module (FEM) leveraging hierarchical attention mechanisms to alleviate small target feature attenuation, and an Adaptive Fusion Unit (AFU) utilizing spatial-channel selection to strengthen target feature representations while mitigating background interference. Second, Dynamic Temperature-scaling Learnable Classifier (DTLC) employs separate learnable temperature parameters for base and novel classes, combined with difficulty-aware weighting and dynamic adjustment, to adaptively calibrate decision boundaries for stable knowledge transfer. Experiments on DIOR and NWPU VHR-10 datasets show that TS-FSOD achieves competitive or superior performance compared to state-of-the-art methods, with improvements up to 4.30% mAP, particularly excelling in 3-shot and 5-shot scenarios. Full article
Show Figures

Figure 1

21 pages, 1300 KB  
Article
CAIC-Net: Robust Radio Modulation Classification via Unified Dynamic Cross-Attention and Cross-Signal-to-Noise Ratio Contrastive Learning
by Teng Wu, Quan Zhu, Runze Mao, Changzhen Hu and Shengjun Wei
Sensors 2026, 26(3), 756; https://doi.org/10.3390/s26030756 - 23 Jan 2026
Viewed by 44
Abstract
In complex wireless communication environments, automatic modulation classification (AMC) faces two critical challenges: the lack of robustness under low-signal-to-noise ratio (SNR) conditions and the inefficiency of integrating multi-scale feature representations. To address these issues, this paper proposes CAIC-Net, a robust modulation classification network [...] Read more.
In complex wireless communication environments, automatic modulation classification (AMC) faces two critical challenges: the lack of robustness under low-signal-to-noise ratio (SNR) conditions and the inefficiency of integrating multi-scale feature representations. To address these issues, this paper proposes CAIC-Net, a robust modulation classification network that integrates a dynamic cross-attention mechanism with a cross-SNR contrastive learning strategy. CAIC-Net employs a dual-stream feature extractor composed of ConvLSTM2D and Transformer blocks to capture local temporal dependencies and global contextual relationships, respectively. To enhance fusion effectiveness, we design a Dynamic Cross-Attention Unit (CAU) that enables deep bidirectional interaction between the two branches while incorporating an SNR-aware mechanism to adaptively adjust the fusion strategy under varying channel conditions. In addition, a Cross-SNR Contrastive Learning (CSCL) module is introduced as an auxiliary task, where positive and negative sample pairs are constructed across different SNR levels and optimized using InfoNCE loss. This design significantly strengthens the intrinsic noise-invariant properties of the learned representations. Extensive experiments conducted on two standard datasets demonstrate that CAIC-Net achieves competitive classification performance at moderate-to-high SNRs and exhibits clear advantages in extremely low-SNR scenarios, validating the effectiveness and strong generalization capability of the proposed approach. Full article
(This article belongs to the Section Communications)
Show Figures

Figure 1

26 pages, 11141 KB  
Article
MISA-Net: Multi-Scale Interaction and Supervised Attention Network for Remote-Sensing Image Change Detection
by Haoyu Yin, Junzhe Wang, Shengyan Liu, Yuqi Wang, Yi Liu, Tengyue Guo and Min Xia
Remote Sens. 2026, 18(2), 376; https://doi.org/10.3390/rs18020376 - 22 Jan 2026
Viewed by 41
Abstract
Change detection in remote sensing imagery plays a vital role in land use analysis, disaster assessment, and ecological monitoring. However, existing remote sensing change detection methods often lack a structured and tightly coupled interaction paradigm to jointly reconcile multi-scale representation, bi-temporal discrimination, and [...] Read more.
Change detection in remote sensing imagery plays a vital role in land use analysis, disaster assessment, and ecological monitoring. However, existing remote sensing change detection methods often lack a structured and tightly coupled interaction paradigm to jointly reconcile multi-scale representation, bi-temporal discrimination, and fine-grained boundary modeling under practical computational constraints. To address this fundamental challenge, we propose a Multi-scale Interaction and Supervised Attention Network (MISANet). To improve the model’s ability to perceive changes at multiple scales, we design a Progressive Multi-Scale Feature Fusion Module (PMFFM), which employs a progressive fusion strategy to effectively integrate multi-granular cross-scale features. To enhance the interaction between bi-temporal features, we introduce a Difference-guided Gated Attention Interaction (DGAI) module. This component leverages difference information between the two time phases and employs a gating mechanism to retain fine-grained details, thereby improving semantic consistency. Furthermore, to guide the model’s focus on change regions, we design a Supervised Attention Decoder Module (SADM). This module utilizes a channel–spatial joint attention mechanism to reweight the feature maps. In addition, a deep supervision strategy is incorporated to direct the model’s attention toward both fine-grained texture differences and high-level semantic changes during training. Experiments conducted on the LEVIR-CD, SYSU-CD, and GZ-CD datasets demonstrate the effectiveness of our method, achieving F1-scores of 91.19%, 82.25%, and 88.35%, respectively. Compared with the state-of-the-art BASNet model, MISANet achieves performance gains of 0.50% F1 and 0.85% IoU on LEVIR-CD, 2.13% F1 and 3.02% IoU on SYSU-CD, and 1.28% F1 and 2.03% IoU on GZ-CD. The proposed method demonstrates strong generalization capabilities and is applicable to various complex change detection scenarios. Full article
Show Figures

Figure 1

28 pages, 19177 KB  
Article
Dual-Task Learning for Fine-Grained Bird Species and Behavior Recognition via Token Re-Segmentation, Multi-Scale Mixed Attention, and Feature Interleaving
by Cong Zhang, Zhichao Chen, Ye Lin, Xiuping Huang and Chih-Wei Lin
Appl. Sci. 2026, 16(2), 966; https://doi.org/10.3390/app16020966 - 17 Jan 2026
Viewed by 107
Abstract
In the ecosystem, birds are important indicators that can sensitively reflect changes in the ecological environment and its health. However, bird monitoring has challenges due to species diversity, variable behaviors, and distinct morphological characteristics. Therefore, we propose a parallel dual-branch hybrid CNN–Transformer architecture [...] Read more.
In the ecosystem, birds are important indicators that can sensitively reflect changes in the ecological environment and its health. However, bird monitoring has challenges due to species diversity, variable behaviors, and distinct morphological characteristics. Therefore, we propose a parallel dual-branch hybrid CNN–Transformer architecture for feature extraction that simultaneously captures local and global image features to address the “local feature similarity” issue in dual tasks of bird species and behaviors. The dual-task framework comprises three main components: the Token Re-segmentation Module (TRM), the Multi-scale Adaptive Module (MAM), and the Feature Interleaving Structure (FIS). The designed MAM fuses hybrid attention to address the problem of different-scale birds. MAM models the interdependencies between spatial and channel dimensions of features from different scales. It enables the model to adaptively choose scale-specific feature representations, accommodating inputs of different scales. In addition, we designed an efficient feature-sharing mechanism, called FIS, between parallel CNN branches. FIS interleaving delivers and fuses CNN feature maps across parallel layers, combining them with the features of the corresponding Transformer layer to share local and global information at different depths and promote deep feature fusion across parallel networks. Finally, we designed the TRM to address the challenge of visually similar but distinct bird species and of similar poses with distinct behaviors. TRM adopts a two-step approach: first, it locates discriminative regions, and then performs fine segmentation on them. This module enables the network to allocate relatively more attention to key areas while merging non-essential information and reducing interference from irrelevant details. Experiments on the self-made dataset demonstrate that, compared with state-of-the-art classification networks, the proposed network achieves the best performance, achieving 79.70% accuracy in bird species recognition, 76.21% in behavior recognition, and the best performance in dual-task recognition. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

24 pages, 2148 KB  
Article
Distribution Network Electrical Equipment Defect Identification Based on Multi-Modal Image Voiceprint Data Fusion and Channel Interleaving
by An Chen, Junle Liu, Wenhao Zhang, Jiaxuan Lu, Jiamu Yang and Bin Liao
Processes 2026, 14(2), 326; https://doi.org/10.3390/pr14020326 - 16 Jan 2026
Viewed by 182
Abstract
With the explosive growth in the quantity of electrical equipment in distribution networks, traditional manual inspection struggles to achieve comprehensive coverage due to limited manpower and low efficiency. This has led to frequent equipment failures including partial discharge, insulation aging, and poor contact. [...] Read more.
With the explosive growth in the quantity of electrical equipment in distribution networks, traditional manual inspection struggles to achieve comprehensive coverage due to limited manpower and low efficiency. This has led to frequent equipment failures including partial discharge, insulation aging, and poor contact. These issues seriously compromise the safe and stable operation of distribution networks. Real-time monitoring and defect identification of their operation status are critical to ensuring the safety and stability of power systems. Currently, commonly used methods for defect identification in distribution network electrical equipment mainly rely on single-image or voiceprint data features. These methods lack consideration of the complementarity and interleaved nature between image and voiceprint features, resulting in reduced identification accuracy and reliability. To address the limitations of existing methods, this paper proposes distribution network electrical equipment defect identification based on multi-modal image voiceprint data fusion and channel interleaving. First, image and voiceprint feature models are constructed using two-dimensional principal component analysis (2DPCA) and the Mel scale, respectively. Multi-modal feature fusion is achieved using an improved transformer model that integrates intra-domain self-attention units and an inter-domain cross-attention mechanism. Second, an image and voiceprint multi-channel interleaving model is applied. It combines channel adaptability and confidence to dynamically adjust weights and generates defect identification results using a weighting approach based on output probability information content. Finally, simulation results show that, under the dataset size of 3300 samples, the proposed algorithm achieves a 8.96–33.27% improvement in defect recognition accuracy compared with baseline algorithms, and maintains an accuracy of over 86.5% even under 20% random noise interference by using improved transformer and multi-channel interleaving mechanism, verifying its advantages in accuracy and noise robustness. Full article
Show Figures

Figure 1

20 pages, 4228 KB  
Article
Research on Defect Detection on Steel Rails Based on Improved YOLO11n Algorithm
by Hongyu Wang and Junmei Zhao
Appl. Sci. 2026, 16(2), 842; https://doi.org/10.3390/app16020842 - 14 Jan 2026
Viewed by 141
Abstract
Aiming at the core issues of the traditional YOLO11n model in rail surface defect detection—fine-grained feature loss of small defects, insufficient micro-target recognition accuracy, and the mismatch of existing downsampling/fusion methods for micro-defect feature extraction—this paper proposes an improved YOLO11n algorithm with two-dimensional [...] Read more.
Aiming at the core issues of the traditional YOLO11n model in rail surface defect detection—fine-grained feature loss of small defects, insufficient micro-target recognition accuracy, and the mismatch of existing downsampling/fusion methods for micro-defect feature extraction—this paper proposes an improved YOLO11n algorithm with two-dimensional network structure innovations. First, the Adaptive Downsampling (ADown) module is introduced into the backbone network for the first time, retaining global features via 2D average pooling and extracting local details through channel-split multi-path convolution/max pooling to avoid fine texture loss. Second, the original SOEP-RFPN-MFM neck network is designed, integrating SNI, GSConvE and MFM modules to achieve dynamic weighted fusion of multi-scale features and break the bottleneck of inefficient small-target feature aggregation. Trained and verified on a 4020-image rail dataset covering four defect types (Spalling, Squat, Wheel Burns, Corrugation), the improved algorithm achieves 93.7% detection accuracy, 92.4% recall and 95.6% mAP, realizing incremental improvements of 1.2, 2.6 and 0.8 percentage points, respectively, compared with the original YOLO11n, which is particularly optimized for rail micro-defect detection scenarios. This study provides a new deep learning method for rail transit micro-defect detection and a reference for scenario-specific improvement of lightweight YOLO11n models. Full article
Show Figures

Figure 1

24 pages, 5237 KB  
Article
DCA-UNet: A Cross-Modal Ginkgo Crown Recognition Method Based on Multi-Source Data
by Yunzhi Guo, Yang Yu, Yan Li, Mengyuan Chen, Wenwen Kong, Yunpeng Zhao and Fei Liu
Plants 2026, 15(2), 249; https://doi.org/10.3390/plants15020249 - 13 Jan 2026
Viewed by 274
Abstract
Wild ginkgo, as an endangered species, holds significant value for genetic resource conservation, yet its practical applications face numerous challenges. Traditional field surveys are inefficient in mountainous mixed forests, while satellite remote sensing is limited by spatial resolution. Current deep learning approaches relying [...] Read more.
Wild ginkgo, as an endangered species, holds significant value for genetic resource conservation, yet its practical applications face numerous challenges. Traditional field surveys are inefficient in mountainous mixed forests, while satellite remote sensing is limited by spatial resolution. Current deep learning approaches relying on single-source data or merely simple multi-source fusion fail to fully exploit information, leading to suboptimal recognition performance. This study presents a multimodal ginkgo crown dataset, comprising RGB and multispectral images acquired by an UAV platform. To achieve precise crown segmentation with this data, we propose a novel dual-branch dynamic weighting fusion network, termed dual-branch cross-modal attention-enhanced UNet (DCA-UNet). We design a dual-branch encoder (DBE) with a two-stream architecture for independent feature extraction from each modality. We further develop a cross-modal interaction fusion module (CIF), employing cross-modal attention and learnable dynamic weights to boost multi-source information fusion. Additionally, we introduce an attention-enhanced decoder (AED) that combines progressive upsampling with a hybrid channel-spatial attention mechanism, thereby effectively utilizing multi-scale features and enhancing boundary semantic consistency. Evaluation on the ginkgo dataset demonstrates that DCA-UNet achieves a segmentation performance of 93.42% IoU (Intersection over Union), 96.82% PA (Pixel Accuracy), 96.38% Precision, and 96.60% F1-score. These results outperform differential feature attention fusion network (DFAFNet) by 12.19%, 6.37%, 4.62%, and 6.95%, respectively, and surpasses the single-modality baselines (RGB or multispectral) in all metrics. Superior performance on cross-flight-altitude data further validates the model’s strong generalization capability and robustness in complex scenarios. These results demonstrate the superiority of DCA-UNet in UAV-based multimodal ginkgo crown recognition, offering a reliable and efficient solution for monitoring wild endangered tree species. Full article
(This article belongs to the Special Issue Advanced Remote Sensing and AI Techniques in Agriculture and Forestry)
Show Figures

Figure 1

21 pages, 15751 KB  
Article
Fault Diagnosis of Gearbox Bearings Based on Multi-Feature Fusion Dual-Channel CNN-Transformer-CAM
by Lihai Chen, Yonghui He, Ao Tan, Xiaolong Bai, Zhenshui Li and Xiaoqiang Wang
Machines 2026, 14(1), 92; https://doi.org/10.3390/machines14010092 - 13 Jan 2026
Viewed by 284
Abstract
As a core component of the gearbox, bearings are crucial to the stability and reliability of the transmission system. However, dynamic variations in operating conditions and complex noise interference present limitations for existing fault diagnosis methods in processing non-stationary signals and capturing complex [...] Read more.
As a core component of the gearbox, bearings are crucial to the stability and reliability of the transmission system. However, dynamic variations in operating conditions and complex noise interference present limitations for existing fault diagnosis methods in processing non-stationary signals and capturing complex features. To address the aforementioned challenges, this paper proposes a bearing fault diagnosis method based on a multi-feature fusion dual-channel CNN-Transformer-CAM framework. The model cross-fuses the two-dimensional feature images from Gramian Angular Difference Field (GADF) and Generalized S Transform (GST), preserving complete time–frequency domain information. First, a dual-channel parallel convolutional structure is employed to separately sample the generalized S-transform (GST) maps and the Gramian Angular Difference Field (GADF) maps, enriching fault information from different dimensions and effectively enhancing the model’s feature extraction capability. Subsequently, a Transformer structure is introduced at the backend of the convolutional neural network to strengthen the representation and analysis of complex time–frequency features. Finally, a cross-attention mechanism is applied to dynamically adjust features from the two channels, achieving adaptive weighted fusion. Test results demonstrate that under conditions of noise interference, limited samples, and multiple operating states, the proposed method can effectively achieve the accurate assessment of bearing fault conditions. Full article
Show Figures

Figure 1

16 pages, 8228 KB  
Article
A Detection Method for Seeding Temperature in Czochralski Silicon Crystal Growth Based on Multi-Sensor Data Fusion
by Lei Jiang, Tongda Chang and Ding Liu
Sensors 2026, 26(2), 516; https://doi.org/10.3390/s26020516 - 13 Jan 2026
Viewed by 156
Abstract
The Czochralski method is the dominant technique for producing power-electronics-grade silicon crystals. At the beginning of the seeding stage, an excessively high (or low) temperature at the solid–liquid interface can cause the time required for the seed to reach the specified length to [...] Read more.
The Czochralski method is the dominant technique for producing power-electronics-grade silicon crystals. At the beginning of the seeding stage, an excessively high (or low) temperature at the solid–liquid interface can cause the time required for the seed to reach the specified length to be too long (or too short). However, the time taken for the seed to reach a specified length is strictly controlled in semiconductor crystal growth to ensure that the initial temperature is appropriate. An inappropriate initial temperature can adversely affect crystal quality and production yield. Accurately evaluating whether the current temperature is appropriate for seeding is therefore essential. However, the temperature at the solid–liquid interface cannot be directly measured, and the current manual evaluation method mainly relies on a visual inspection of the meniscus. Previous methods for detecting this temperature classified image features, lacking a quantitative assessment of the temperature. To address this challenge, this study proposed using the duration of the seeding stage as the target variable for evaluating the temperature and developed an improved multimodal fusion regression network. Temperature signals collected from a central pyrometer and an auxiliary pyrometer were transformed into time–frequency representations via wavelet transform. Features extracted from the time–frequency diagrams, together with meniscus features, were fused through a two-level mechanism with multimodal feature fusion (MFF) and channel attention (CA), followed by masking using spatial attention (SA). The fused features were then input into a random vector functional link network (RVFLN) to predict the seeding duration, thereby establishing an indirect relationship between multi-sensor data and the seeding temperature achieving a quantification of the temperature that could not be directly measured. Transfer comparison experiments conducted on our dataset verified the effectiveness of the feature extraction strategy and demonstrated the superior detection performance of the proposed model. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

24 pages, 5639 KB  
Article
TransUV: A TransNeXt-Based Model with Multi-Scale and Attention Fusion for Fine-Grained Urban Village Extraction
by Xiaobao Lin, Yu Wang, Yaming Zhou, Guangjun Wang and Sai Chen
Remote Sens. 2026, 18(2), 223; https://doi.org/10.3390/rs18020223 - 9 Jan 2026
Viewed by 284
Abstract
Urban villages (UVs) are widespread in rapidly urbanizing regions, but their fine-grained delineation from high-resolution remote sensing imagery remains a challenge due to complex spatial textures and ambiguous boundaries. To address this issue, this paper proposes TransUV, a TransNeXt-based encoder–decoder segmentation framework tailored [...] Read more.
Urban villages (UVs) are widespread in rapidly urbanizing regions, but their fine-grained delineation from high-resolution remote sensing imagery remains a challenge due to complex spatial textures and ambiguous boundaries. To address this issue, this paper proposes TransUV, a TransNeXt-based encoder–decoder segmentation framework tailored to UV extraction. At the encoder front end, a Multi-level Feature Enhancement Module (MFEM) injects boundary- and texture-aware inductive bias by combining Laplacian-of-Gaussian (LoG) filtering with Gaussian smoothing, which strengthens edge responses while suppressing noise. At the decoder stage, we design a lightweight SegUV decoder equipped with an Advanced Attention Fusion Module (AAFM) that adaptively fuses multi-scale features using complementary channel, spatial, and directional attention. Experiments on 0.5 m imagery from two Chinese cities demonstrate that TransUV achieves an mIoU of 86.67% and an overall accuracy of 92.98%, significantly outperforming other mainstream models. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Graphical abstract

20 pages, 3945 KB  
Article
Dual-Modal Mixture-of-KAN Network for Lithium-Ion Battery State-of-Health Estimation Using Early Charging Data
by Yun Wang, Ziyang Zhang and Fan Zhang
Energies 2026, 19(2), 335; https://doi.org/10.3390/en19020335 - 9 Jan 2026
Viewed by 263
Abstract
Accurate estimation of the state of health (SOH) of lithium-ion batteries is crucial for the safe operation of electric vehicles and energy storage systems. However, most existing methods rely on complete charging curves or manual feature engineering, making them difficult to adapt to [...] Read more.
Accurate estimation of the state of health (SOH) of lithium-ion batteries is crucial for the safe operation of electric vehicles and energy storage systems. However, most existing methods rely on complete charging curves or manual feature engineering, making them difficult to adapt to practical scenarios where only limited charging segments are available. To fully exploit degradation information from limited charging data, this paper proposes a dual-modal mixture of Kolmogorov–Arnold network (DM-MoKAN) for lithium-ion battery SOH estimation using only early-stage constant-current charging voltage data. The proposed method incorporates three synergistic modules: an image branch, a sequence branch, and a dual-modal fusion regression module. The image branch converts one-dimensional voltage sequences into two-dimensional Gramian Angular Difference Field (GADF) images and extracts spatial degradation features through a lightweight network integrating Ghost convolution and efficient channel attention (ECA). The sequence branch employs a patch-based Transformer encoder to directly model local patterns and long-range dependencies in the raw voltage sequence. The dual-modal fusion module concatenates features from both branches and feeds them into a MoKAN regression head composed of multiple KAN experts and a gating network for adaptive nonlinear mapping to SOH. Experimental results demonstrate that DM-MoKAN outperforms various baseline methods on both Oxford and NASA datasets, achieving average RMSE/MAE of 0.28%/0.19% and 0.89%/0.71%, respectively. Ablation experiments further verify the effective contributions of the dual-modal fusion strategy, ECA attention mechanism, and MoKAN regression head to estimation performance improvement. Full article
Show Figures

Figure 1

22 pages, 3809 KB  
Article
Research on Remote Sensing Image Object Segmentation Using a Hybrid Multi-Attention Mechanism
by Lei Chen, Changliang Li, Yixuan Gao, Yujie Chang, Siming Jin, Zhipeng Wang, Xiaoping Ma and Limin Jia
Appl. Sci. 2026, 16(2), 695; https://doi.org/10.3390/app16020695 - 9 Jan 2026
Viewed by 202
Abstract
High-resolution remote sensing images are gradually playing an important role in land cover mapping, urban planning, and environmental monitoring tasks. However, current segmentation approaches frequently encounter challenges such as loss of detail and blurred boundaries when processing high-resolution remote sensing imagery, owing to [...] Read more.
High-resolution remote sensing images are gradually playing an important role in land cover mapping, urban planning, and environmental monitoring tasks. However, current segmentation approaches frequently encounter challenges such as loss of detail and blurred boundaries when processing high-resolution remote sensing imagery, owing to their complex backgrounds and dense semantic content. In response to the aforementioned limitations, this study introduces HMA-UNet, a novel segmentation network built upon the UNet framework and enhanced through a hybrid attention strategy. The architecture’s innovation centers on a composite attention block, where a lightweight split fusion attention (LSFA) mechanism and a lightweight channel-spatial attention (LCSA) mechanism are synergistically integrated within a residual learning structure to replace the stacked convolutional structure in UNet, which can improve the utilization of important shallow features and eliminate redundant information interference. Comprehensive experiments on the WHDLD dataset and the DeepGlobe road extraction dataset show that our proposed method achieves effective segmentation in remote sensing images by fully utilizing shallow features and eliminating redundant information interference. The quantitative evaluation results demonstrate the performance of the proposed method across two benchmark datasets. On the WHDLD dataset, the model attains a mean accuracy, IoU, precision, and recall of 72.40%, 60.71%, 75.46%, and 72.41%, respectively. Correspondingly, on the DeepGlobe road extraction dataset, it achieves a mean accuracy of 57.87%, an mIoU of 49.82%, a mean precision of 78.18%, and a mean recall of 57.87%. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 5302 KB  
Article
LSSCC-Net: Integrating Spatial-Feature Aggregation and Adaptive Attention for Large-Scale Point Cloud Semantic Segmentation
by Wenbo Wang, Xianghong Hua, Cheng Li, Pengju Tian, Yapeng Wang and Lechao Liu
Symmetry 2026, 18(1), 124; https://doi.org/10.3390/sym18010124 - 8 Jan 2026
Viewed by 250
Abstract
Point cloud semantic segmentation is a key technology for applications such as autonomous driving, robotics, and virtual reality. Current approaches are heavily reliant on local relative coordinates and simplistic attention mechanisms to aggregate neighborhood information. This often leads to an ineffective joint representation [...] Read more.
Point cloud semantic segmentation is a key technology for applications such as autonomous driving, robotics, and virtual reality. Current approaches are heavily reliant on local relative coordinates and simplistic attention mechanisms to aggregate neighborhood information. This often leads to an ineffective joint representation of geometric perturbations and feature variations, coupled with a lack of adaptive selection for salient features during context fusion. On this basis, we propose LSSCC-Net, a novel segmentation framework based on LACV-Net. First, the spatial-feature dynamic aggregation module is designed to fuse offset information by symmetric interaction between spatial positions and feature channels, thus supplementing local structural information. Second, a dual-dimensional attention mechanism (spatial and channel) is introduced to symmetrically deploy attention modules in both the encoder and decoder, prioritizing salient information extraction. Finally, Lovász-Softmax Loss is used as an auxiliary loss to optimize the training objective. The proposed method is evaluated on two public benchmark datasets. The mIoU on the Toronto3D and S3DIS datasets is 83.6% and 65.2%, respectively. Compared with the baseline LACV-Net, LSSCC-Net showed notable improvements in challenging categories: the IoU for “road mark” and “fence” on Toronto3D increased by 3.6% and 8.1%, respectively. These results indicate that LSSCC-Net more accurately characterizes complex boundaries and fine-grained structures, enhancing segmentation capabilities for small-scale targets and category boundaries. Full article
Show Figures

Figure 1

Back to TopTop