Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (64)

Search Parameters:
Keywords = multi-scale linear spatial attention

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 2530 KB  
Article
Arrester Fault Recognition Model Based on Thermal Imaging Images Using VMamba
by Lin Lin, Jiantao Li, Jianan Wang, Yong Luo and Yueyue Liu
Electronics 2025, 14(24), 4784; https://doi.org/10.3390/electronics14244784 - 5 Dec 2025
Viewed by 171
Abstract
The intelligent fault detection of power plant equipment in industrial settings often grapples with challenges such as insufficient real-time performance and interference from complex backgrounds. To address these issues, this paper proposes an image recognition and classification model based on the VMamba architecture. [...] Read more.
The intelligent fault detection of power plant equipment in industrial settings often grapples with challenges such as insufficient real-time performance and interference from complex backgrounds. To address these issues, this paper proposes an image recognition and classification model based on the VMamba architecture. At the core of our feature extraction module, we have improved and optimized the two-dimensional state space (SS2D) algorithm to replace the traditional convolution operation. Rooted in State-Space Models (SSMs), the SS2D module possesses a global receptive field by design, enabling it to effectively capture long-range dependencies and establish comprehensive contextual relationships between local and global features. Crucially, unlike the self-attention mechanism in Vision Transformers (ViT) that suffers from quadratic computational complexity, VMamba achieves this global modeling with linear complexity, significantly enhancing computational efficiency. Furthermore, we employ an enhanced PAN-FPN multi-scale feature fusion strategy integrated with the Squeeze-and-Excitation (SE) attention mechanism. This combination optimizes the spatial distribution of feature representations through channel-wise attention weighting, facilitating the effective integration of cross-level spatial features and the suppression of background noise. This study thus presents a solution for industrial equipment fault diagnosis that achieves a superior balance between high accuracy and low latency. Full article
Show Figures

Figure 1

18 pages, 1835 KB  
Article
Towards Robust Medical Image Segmentation with Hybrid CNN–Linear Mamba
by Xiao Ma and Guangming Lu
Electronics 2025, 14(23), 4726; https://doi.org/10.3390/electronics14234726 - 30 Nov 2025
Viewed by 297
Abstract
Problem: Medical image segmentation faces critical challenges in balancing global context modeling and computational efficiency. While conventional neural networks struggle with long-range dependencies, Transformers incur quadratic complexity. Although Mamba-based architectures achieve linear complexity, they lack adaptive mechanisms for heterogeneous medical images and demonstrate [...] Read more.
Problem: Medical image segmentation faces critical challenges in balancing global context modeling and computational efficiency. While conventional neural networks struggle with long-range dependencies, Transformers incur quadratic complexity. Although Mamba-based architectures achieve linear complexity, they lack adaptive mechanisms for heterogeneous medical images and demonstrate insufficient local feature extraction capabilities. Method: We propose Linear Context-Aware Robust Mamba (LCAR–Mamba) to address these dual limitations through adaptive resource allocation and enhanced multi-scale extraction. LCAR–Mamba integrates two synergistic modules: the Context-Aware Linear Mamba Module (CALM) for adaptive global–local fusion, and the Multi-scale Partial Dilated Convolution Module (MSPD) for efficient multi-scale feature refinement. Core Innovations: CALM module implements content-driven resource allocation through four-stage processing: (1) analyzing spatial complexity via gradient and activation statistics, (2) computing allocation weights to dynamically balance global and local processing branches, (3) parallel dual-path processing with linear attention and convolution, and (4) adaptive fusion guided by complexity weights. MSPD module employs statistics-based channel selection and multi-scale partial dilated convolutions to capture features at multiple receptive scales while reducing computational cost. Key Results: On ISIC2017 and ISIC2018 datasets, mIoU improvements of 0.81%/1.44% confirm effectiveness across 2D benchmarks. On the Synapse dataset, LCAR–Mamba achieves 85.56% DSC, outperforming the former best Mamba baseline by 0.48% with 33% fewer parameters. Significance: LCAR–Mamba demonstrates that adaptive resource allocation and statistics-driven multi-scale extraction can address critical limitations in linear-complexity architectures, establishing a promising direction for efficient medical image segmentation. Full article
(This article belongs to the Special Issue Target Tracking and Recognition Techniques and Their Applications)
Show Figures

Figure 1

15 pages, 6069 KB  
Article
DPM-UNet: A Mamba-Based Network with Dynamic Perception Feature Enhancement for Medical Image Segmentation
by Shangyu Xu, Xiaohang Liu, Hongsheng Lei and Bin Hui
Sensors 2025, 25(22), 7053; https://doi.org/10.3390/s25227053 - 19 Nov 2025
Viewed by 506
Abstract
In medical image segmentation, effective integration of global and local features is crucial. Current methods struggle to simultaneously model long-range dependencies and fine local details. Convolutional Neural Networks (CNNs) excel at extracting local features but are limited by their local receptive fields for [...] Read more.
In medical image segmentation, effective integration of global and local features is crucial. Current methods struggle to simultaneously model long-range dependencies and fine local details. Convolutional Neural Networks (CNNs) excel at extracting local features but are limited by their local receptive fields for capturing long-range dependencies. While global self-attention mechanisms (e.g., in Transformers) can capture long-range spatial relationships, their quadratic computational complexity incurs high costs for high-resolution medical images. To address these limitations, State Space Models (SSMs), which maintain linear complexity while effectively establishing long-range dependencies, have been introduced to visual tasks. Leveraging the advantages of SSMs, this paper proposes DPM-UNet. The network employs a Dual-path Residual Fusion Module (DRFM) at shallow layers to extract local detailed features and a DPMamba Module at deep layers to model global semantic information, achieving effective local global feature fusion. A Multi-scale Aggregation Attention Network (MAAN) is further incorporated to enhance multi-scale representations. The proposed method collaboratively captures local details, long-range dependencies, and multi-scale information in medical images. Experiments on three public datasets demonstrate that DPM-UNet outperforms existing methods across multiple evaluation metrics. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

25 pages, 6484 KB  
Article
FreqMamba: A Frequency-Aware Mamba Framework with Group-Separated Attention for Hyperspectral Image Classification
by Tong Zhou, Jianghe Zhai and Zhiwen Zhang
Remote Sens. 2025, 17(22), 3749; https://doi.org/10.3390/rs17223749 - 18 Nov 2025
Viewed by 575
Abstract
Hyperspectral imagery (HSI), characterized by the integration of both spatial and spectral information, is widely employed in various fields, such as environmental monitoring, geological exploration, precision agriculture, and medical imaging. Hyperspectral image classification (HSIC), as a key research direction, aims to establish a [...] Read more.
Hyperspectral imagery (HSI), characterized by the integration of both spatial and spectral information, is widely employed in various fields, such as environmental monitoring, geological exploration, precision agriculture, and medical imaging. Hyperspectral image classification (HSIC), as a key research direction, aims to establish a mapping relationship between pixels and land-cover categories. Nevertheless, several challenges persist, including difficulties in feature extraction, the trade-off between effective integration of local and global features, and spectral redundancy. We propose FreqMamba, a novel model that efficiently combines CNN, a custom attention mechanism, and the Mamba architecture. The proposed framework comprises three key components: (1) A novel multi-scale deformable convolution feature extraction module equipped with spectral attention, which processes spectral and spatial information through a dual-branch structure to enhance feature representation for irregular terrain contours; (2) a novel group-separated attention module that integrates group convolution with group-separated self-attention, effectively balancing local feature extraction and global contextual modeling; (3) a newly introduced bidirectional scanning Mamba branch that efficiently captures long-range dependencies with linear computational complexity. The proposed method achieves optimal performance on multiple benchmark datasets, including QUH-Tangdaowan, QUH-Qingyun, and QUH-Pingan, with the highest overall accuracy reaching 97.47%, average accuracy reaching 93.52%, and a Kappa coefficient of 96.22%. It significantly outperforms existing CNN, Transformer, and SSM-based methods, demonstrating its effectiveness, robustness, and superior generalization capability. Full article
Show Figures

Figure 1

19 pages, 3240 KB  
Article
AI-Based Downscaling of MODIS LST Using SRDA-Net Model for High-Resolution Data Generation
by Hongxia Ma, Kebiao Mao, Zijin Yuan, Longhao Xu, Jiancheng Shi, Zhonghua Guo and Zhihao Qin
Remote Sens. 2025, 17(21), 3510; https://doi.org/10.3390/rs17213510 - 22 Oct 2025
Viewed by 489
Abstract
Land surface temperature (LST) is a critical parameter in agricultural drought monitoring, crop growth analysis, and climate change research. However, the challenge of acquiring high-resolution LST data with both fine spatial and temporal scales remains a significant obstacle in remote sensing applications. Despite [...] Read more.
Land surface temperature (LST) is a critical parameter in agricultural drought monitoring, crop growth analysis, and climate change research. However, the challenge of acquiring high-resolution LST data with both fine spatial and temporal scales remains a significant obstacle in remote sensing applications. Despite the high temporal resolution afforded by daily MODIS LST observations, the coarse (1 km) spatial scale of these data restricts their applicability for studies demanding finer spatial resolution. To address this challenge, a novel deep learning-based approach is proposed for LST downscaling: the spatial resolution downscaling attention network (SRDA-Net). The model is designed to upscale the resolution of MODIS LST from 1000 m to 250 m, overcoming the shortcomings of traditional interpolation techniques in reconstructing spatial details, as well as reducing the reliance on linear models and multi-source high-temporal LST data typical of conventional fusion approaches. SRDA-Net captures the feature interaction between MODIS LST and auxiliary data through global resolution attention to address spatial heterogeneity. It further enhances the feature representation ability under heterogeneous surface conditions by optimizing multi-source features to handle heterogeneous data. Additionally, it strengthens the model of spatial dependency relationships through a multi-level feature refinement module. Moreover, this study constructs a composite loss function system that integrates physical mechanisms and data characteristics, ensuring the improvement of reconstruction details while maintaining numerical accuracy and model interpret-ability through a triple collaborative constraint mechanism. Experimental results show that the proposed model performs excellently in the simulation experiment (from 2000 m to 1000 m), with an MAE of 0.928 K and an R2 of 0.95. In farmland areas, the model performs particularly well (MAE = 0.615 K, R2 = 0.96, RMSE = 0.823 K), effectively supporting irrigation scheduling and crop health monitoring. It also maintains good vegetation heterogeneity expression ability in grassland areas, making it suitable for drought monitoring tasks. In the target downscaling experiment (from 1000 m to 500 m and 250 m), the model achieved an RMSE of 1.804 K, an MAE of 1.587 K, and an R2 of 0.915, confirming its stable generalization ability across multiple scales. This study supports agricultural drought warning and precise irrigation and provides data support for interdisciplinary applications such as climate change research and ecological monitoring, while offering a new approach to generating high spatio-temporal resolution LST. Full article
Show Figures

Figure 1

21 pages, 7208 KB  
Article
Optimization Algorithm for Detection of Impurities in Polypropylene Random Copolymer Raw Materials Based on YOLOv11
by Mingchen Dai and Xuedong Jing
Electronics 2025, 14(19), 3934; https://doi.org/10.3390/electronics14193934 - 3 Oct 2025
Viewed by 399
Abstract
Impurities in polypropylene random copolymer (PPR) raw materials can seriously affect the performance of the final product, and efficient and accurate impurity detection is crucial to ensure high production quality. In order to solve the problems of high small-target miss rates, weak anti-interference [...] Read more.
Impurities in polypropylene random copolymer (PPR) raw materials can seriously affect the performance of the final product, and efficient and accurate impurity detection is crucial to ensure high production quality. In order to solve the problems of high small-target miss rates, weak anti-interference ability, and difficulty in balancing accuracy and speed in existing detection methods used in complex industrial scenarios, this paper proposes an enhanced machine vision detection algorithm based on YOLOv11. Firstly, the FasterLDConv module dynamically adjusts the position of sampling points through linear deformable convolution (LDConv), which improves the feature extraction ability of small-scale targets on complex backgrounds while maintaining lightweight features. The IR-EMA attention mechanism is a novel approach that combines an efficient reverse residual architecture with multi-scale attention. This combination enables the model to jointly capture feature channel dependencies and spatial relationships, thereby enhancing its sensitivity to weak impurity features. Again, a DC-DyHead deformable dynamic detection head is constructed, and deformable convolutions are embedded into the spatial perceptual attention of DyHead to enhance its feature modelling ability for anomalies and occluded impurities. We introduce an enhanced InnerMPDIoU loss function to optimise the bounding box regression strategy. This new method addresses issues related to traditional CIoU losses, including excessive penalties imposed on small targets and a lack of sufficient gradient guidance in situations where there is almost no overlap. The results indicate that the average precision (mAP@0.5) of the improved algorithm on the self-made PPR impurity dataset reached 88.6%, which is 2.3% higher than that of the original YOLOv11n, while precision (P) and recall (R) increased by 2.4% and 2.8%, respectively. This study provides a reliable technical solution for the quality inspection of PPR raw materials and serves as a reference for algorithm optimisation in the field of industrial small-target detection. Full article
Show Figures

Figure 1

27 pages, 9667 KB  
Article
REU-YOLO: A Context-Aware UAV-Based Rice Ear Detection Model for Complex Field Scenes
by Dongquan Chen, Kang Xu, Wenbin Sun, Danyang Lv, Songmei Yang, Ranbing Yang and Jian Zhang
Agronomy 2025, 15(9), 2225; https://doi.org/10.3390/agronomy15092225 - 20 Sep 2025
Viewed by 705
Abstract
Accurate detection and counting of rice ears serve as a critical indicator for yield estimation, but the complex conditions of paddy fields limit the efficiency and precision of traditional sampling methods. We propose REU-YOLO, a model specifically designed for UAV low-altitude remote sensing [...] Read more.
Accurate detection and counting of rice ears serve as a critical indicator for yield estimation, but the complex conditions of paddy fields limit the efficiency and precision of traditional sampling methods. We propose REU-YOLO, a model specifically designed for UAV low-altitude remote sensing to collect images of rice ears, to address issues such as high-density and complex spatial distribution with occlusion in field scenes. Initially, we combine the Additive Block containing Convolutional Additive Self-attention (CAS) and Convolutional Gated Linear Unit (CGLU) to propose a novel module called Additive-CGLU-C2F (AC-C2f) as a replacement for the original C2f in YOLOv8. It can capture the contextual information between different regions of images and improve the feature extraction ability of the model, introduce the Dropblock strategy to reduce model overfitting, and replace the original SPPF module with the SPPFCSPC-G module to enhance feature representation and improve the capacity of the model to extract features across varying scales. We further propose a feature fusion network called Multi-branch Bidirectional Feature Pyramid Network (MBiFPN), which introduces a small object detection head and adjusts the head to focus more on small and medium-sized rice ear targets. By using adaptive average pooling and bidirectional weighted feature fusion, shallow and deep features are dynamically fused to enhance the robustness of the model. Finally, the Inner-PloU loss function is introduced to improve the adaptability of the model to rice ear morphology. In the self-developed dataset UAVR, REU-YOLO achieves a precision (P) of 90.76%, a recall (R) of 86.94%, an mAP0.5 of 93.51%, and an mAP0.5:0.95 of 78.45%, which are 4.22%, 3.76%, 4.85%, and 8.27% higher than the corresponding values obtained with YOLOv8 s, respectively. Furthermore, three public datasets, DRPD, MrMT, and GWHD, were used to perform a comprehensive evaluation of REU-YOLO. The results show that REU-YOLO indicates great generalization capabilities and more stable detection performance. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

25 pages, 7964 KB  
Article
DSCSRN: Physically Guided Symmetry-Aware Spatial-Spectral Collaborative Network for Single-Image Hyperspectral Super-Resolution
by Xueli Chang, Jintong Liu, Guotao Wen, Xiaoyu Huang and Meng Yan
Symmetry 2025, 17(9), 1520; https://doi.org/10.3390/sym17091520 - 12 Sep 2025
Viewed by 643
Abstract
Hyperspectral images (HSIs), with their rich spectral information, are widely used in remote sensing; yet the inherent trade-off between spectral and spatial resolution in imaging systems often limits spatial details. Single-image hyperspectral super-resolution (HSI-SR) seeks to recover high-resolution HSIs from a single low-resolution [...] Read more.
Hyperspectral images (HSIs), with their rich spectral information, are widely used in remote sensing; yet the inherent trade-off between spectral and spatial resolution in imaging systems often limits spatial details. Single-image hyperspectral super-resolution (HSI-SR) seeks to recover high-resolution HSIs from a single low-resolution input, but the high dimensionality and spectral redundancy of HSIs make this task challenging. In HSIs, spectral signatures and spatial textures often exhibit intrinsic symmetries, and preserving these symmetries provides additional physical constraints that enhance reconstruction fidelity and robustness. To address these challenges, we propose the Dynamic Spectral Collaborative Super-Resolution Network (DSCSRN), an end-to-end framework that integrates physical modeling with deep learning and explicitly embeds spatial–spectral symmetry priors into the network architecture. DSCSRN processes low-resolution HSIs with a Cascaded Residual Spectral Decomposition Network (CRSDN) to compress redundant channels while preserving spatial structures, generating accurate abundance maps. These maps are refined by two Synergistic Progressive Feature Refinement Modules (SPFRMs), which progressively enhance spatial textures and spectral details via a multi-scale dual-domain collaborative attention mechanism. The Dynamic Endmember Adjustment Module (DEAM) then adaptively updates spectral endmembers according to scene context, overcoming the limitations of fixed-endmember assumptions. Grounded in the Linear Mixture Model (LMM), this unmixing–recovery–reconstruction pipeline restores subtle spectral variations alongside improved spatial resolution. Experiments on the Chikusei, Pavia Center, and CAVE datasets show that DSCSRN outperforms state-of-the-art methods in both perceptual quality and quantitative performance, achieving an average PSNR of 43.42 and a SAM of 1.75 (×4 scale) on Chikusei. The integration of symmetry principles offers a unifying perspective aligned with the intrinsic structure of HSIs, producing reconstructions that are both accurate and structurally consistent. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

24 pages, 32280 KB  
Article
Spectral Channel Mixing Transformer with Spectral-Center Attention for Hyperspectral Image Classification
by Zhenming Sun, Hui Liu, Ning Chen, Haina Yang, Jia Li, Chang Liu and Xiaoping Pei
Remote Sens. 2025, 17(17), 3100; https://doi.org/10.3390/rs17173100 - 5 Sep 2025
Cited by 1 | Viewed by 1327
Abstract
In recent years, the research trend of HSI classification has focused on the innovative integration of deep learning and Transformer architecture to enhance classification performance through multi-scale feature extraction, attention mechanism optimization, and spectral–spatial collaborative modeling. However, due to the excessive computational complexity [...] Read more.
In recent years, the research trend of HSI classification has focused on the innovative integration of deep learning and Transformer architecture to enhance classification performance through multi-scale feature extraction, attention mechanism optimization, and spectral–spatial collaborative modeling. However, due to the excessive computational complexity and the large number of parameters of the Transformer, there is an expansion bottleneck in long sequence tasks, and the collaborative optimization of the algorithm and hardware is required. To better handle this issue, our paper proposes a method which integrates RWKV linear attention with Transformer through a novel TC-Former framework, combining TimeMixFormer and HyperMixFormer architectures. Specifically, TimeMixFormer has optimized the computational complexity through time decay weights and gating design, significantly improving the processing efficiency of long sequences and reducing the computational complexity. HyperMixFormer employs a gated WKV mechanism and dynamic channel weighting, combined with Mish activation and time-shift operations, to optimize computational overhead while achieving efficient cross-channel interaction, significantly enhancing the discriminative representation of spectral features. The pivotal characteristic of the proposed method lies in its innovative integration of linear attention mechanisms, which enhance HSI classification accuracy while achieving lower computational complexity. Evaluation experiments on three public hyperspectral datasets confirm that this framework outperforms the previous state-of-the-art algorithms in classification accuracy. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

23 pages, 1476 KB  
Article
Dynamically Optimized Object Detection Algorithms for Aviation Safety
by Yi Qu, Cheng Wang, Yilei Xiao, Haijuan Ju and Jing Wu
Electronics 2025, 14(17), 3536; https://doi.org/10.3390/electronics14173536 - 4 Sep 2025
Cited by 1 | Viewed by 772
Abstract
Infrared imaging technology demonstrates significant advantages in aviation safety monitoring due to its exceptional all-weather operational capability and anti-interference characteristics, particularly in scenarios requiring real-time detection of aerial objects such as airport airspace management. However, traditional infrared target detection algorithms face critical challenges [...] Read more.
Infrared imaging technology demonstrates significant advantages in aviation safety monitoring due to its exceptional all-weather operational capability and anti-interference characteristics, particularly in scenarios requiring real-time detection of aerial objects such as airport airspace management. However, traditional infrared target detection algorithms face critical challenges in complex sky backgrounds, including low signal-to-noise ratio (SNR), small target dimensions, and strong background clutter, leading to insufficient detection accuracy and reliability. To address these issues, this paper proposes the AFK-YOLO model based on the YOLO11 framework: it integrates an ADown downsampling module, which utilizes a dual-branch strategy combining average pooling and max pooling to effectively minimize feature information loss during spatial resolution reduction; introduces the KernelWarehouse dynamic convolution approach, which adopts kernel partitioning and a contrastive attention-based cross-layer shared kernel repository to address the challenge of linear parameter growth in conventional dynamic convolution methods; and establishes a feature decoupling pyramid network (FDPN) that replaces static feature pyramids with a dynamic multi-scale fusion architecture, utilizing parallel multi-granularity convolutions and an EMA attention mechanism to achieve adaptive feature enhancement. Experiments demonstrate that the AFK-YOLO model achieves 78.6% mAP on a self-constructed aerial infrared dataset—a 2.4 percentage point improvement over the baseline YOLO11—while meeting real-time requirements for aviation safety monitoring (416.7 FPS), reducing parameters by 6.9%, and compressing weight size by 21.8%. The results demonstrate the effectiveness of dynamic optimization methods in improving the accuracy and robustness of infrared target detection under complex aerial environments, thereby providing reliable technical support for the prevention of mid-air collisions. Full article
(This article belongs to the Special Issue Computer Vision and AI Algorithms for Diverse Scenarios)
Show Figures

Figure 1

18 pages, 6467 KB  
Article
State-Space Model Meets Linear Attention: A Hybrid Architecture for Internal Wave Segmentation
by Zhijie An, Zhao Li, Saheya Barintag, Hongyu Zhao, Yanqing Yao, Licheng Jiao and Maoguo Gong
Remote Sens. 2025, 17(17), 2969; https://doi.org/10.3390/rs17172969 - 27 Aug 2025
Viewed by 3109
Abstract
Internal waves (IWs) play a crucial role in the transport of energy and matter within the ocean while also posing significant risks to marine engineering, navigation, and underwater communication systems. Consequently, effective segmentation methods are essential for mitigating their adverse impacts and minimizing [...] Read more.
Internal waves (IWs) play a crucial role in the transport of energy and matter within the ocean while also posing significant risks to marine engineering, navigation, and underwater communication systems. Consequently, effective segmentation methods are essential for mitigating their adverse impacts and minimizing associated hazards. A promising strategy involves applying remote sensing image segmentation techniques to accurately identify IWs, thereby enabling predictions of their propagation velocity and direction. However, current IWs segmentation models struggle to balance computational efficiency and segmentation accuracy, often resulting in either excessive computational costs or inadequate performance. Motivated by recent developments in the Mamba2 architecture, this paper introduces the state-space model meets linear attention (SMLA), a novel segmentation framework specifically designed for IWs. The proposed hybrid architecture effectively integrates three key components: a feature-aware serialization (FAS) block to efficiently convert spatial features into sequences; a state-space model with linear attention (SSM-LA) block that synergizes a state-space model with linear attention for comprehensive feature extraction; and a decoder driven by hierarchical fusion and upsampling, which performs channel alignment and scale unification across multi-level features to ensure high-fidelity spatial detail recovery. Experiments conducted on a dataset of 484 synthetic-aperture radar (SAR) images containing IWs from the South China Sea achieved a mean Intersection over Union (MIoU) of 74.3%, surpassing competing methods evaluated on the same dataset. These results demonstrate the superior effectiveness of SMLA in extracting features of IWs from SAR imagery. Full article
(This article belongs to the Special Issue Advancements of Vision-Language Models (VLMs) in Remote Sensing)
Show Figures

Figure 1

25 pages, 7900 KB  
Article
Multi-Label Disease Detection in Chest X-Ray Imaging Using a Fine-Tuned ConvNeXtV2 with a Customized Classifier
by Kangzhe Xiong, Yuyun Tu, Xinping Rao, Xiang Zou and Yingkui Du
Informatics 2025, 12(3), 80; https://doi.org/10.3390/informatics12030080 - 14 Aug 2025
Viewed by 3668
Abstract
Deep-learning-based multiple label chest X-ray classification has achieved significant success, but existing models still have three main issues: fixed-scale convolutions fail to capture both large and small lesions, standard pooling is lacking in the lack of attention to important regions, and linear classification [...] Read more.
Deep-learning-based multiple label chest X-ray classification has achieved significant success, but existing models still have three main issues: fixed-scale convolutions fail to capture both large and small lesions, standard pooling is lacking in the lack of attention to important regions, and linear classification lacks the capacity to model complex dependency between features. To circumvent these obstacles, we propose CONVFCMAE, a lightweight yet powerful framework that is built on a backbone that is partially frozen (77.08 % of the initial layers are fixed) in order to preserve complex, multi-scale features while decreasing the number of trainable parameters. Our architecture adds (1) an intelligent global pooling module that is learnable, with 1×1 convolutions that are dynamically weighted by their spatial location, and (2) a multi-head attention block that is dedicated to channel re-calibration, along with (3) a two-layer MLP that has been enhanced with ReLU, batch normalization, and dropout. This module is used to enhance the non-linearity of the feature space. To further reduce the noise associated with labels and the imbalance in class distribution inherent to the NIH ChestXray14 dataset, we utilize a combined loss that combines BCEWithLogits and Focal Loss as well as extensive data augmentation. On ChestXray14, the average ROC–AUC of CONVFCMAE is 0.852, which is 3.97 percent greater than the state of the art. Ablation experiments demonstrate the individual and collective effectiveness of each component. Grad-CAM visualizations have a superior capacity to localize the pathological regions, and this increases the interpretability of the model. Overall, CONVFCMAE provides a practical, generalizable solution to the problem of extracting features from medical images in a practical manner. Full article
(This article belongs to the Section Medical and Clinical Informatics)
Show Figures

Figure 1

20 pages, 3319 KB  
Article
Symmetric Versus Asymmetric Transformer Architectures for Spatio-Temporal Modeling in Effluent Wastewater Quality Prediction
by Tong Hu, Zikang Chen, Jun Song and Hongbin Liu
Symmetry 2025, 17(8), 1322; https://doi.org/10.3390/sym17081322 - 14 Aug 2025
Viewed by 627
Abstract
Accurate prediction of effluent quality indicators is essential for ensuring stable operation and regulatory compliance in wastewater treatment plants. However, the inherent spatial distribution and temporal fluctuations of wastewater processes present significant challenges for modeling. In this study, we propose a dynamic multi-scale [...] Read more.
Accurate prediction of effluent quality indicators is essential for ensuring stable operation and regulatory compliance in wastewater treatment plants. However, the inherent spatial distribution and temporal fluctuations of wastewater processes present significant challenges for modeling. In this study, we propose a dynamic multi-scale spatio-temporal Transformer (DMST-Transformer) with a symmetric architecture to enhance prediction accuracy in complex wastewater systems. Unlike conventional asymmetric designs, the DMST-Transformer extracts spatial and temporal features in parallel using a spatial graph convolutional network and a multi-scale self-attention mechanism coupled with a dynamic self-tuning module. The model is evaluated on a full-process dataset collected from a municipal wastewater treatment plant, with biochemical oxygen demand selected as the target indicator. Experimental results on test data show that the DMST-Transformer achieves a coefficient of determination of 0.93, root mean square error of 1.40 mg/L, and mean absolute percentage error of 6.61%, outperforming classical models such as linear regression, partial least squares, and graph convolutional networks, as well as advanced deep learning baselines including Transformer and ST-Transformer. Ablation studies confirm the complementary effectiveness of the spatial and temporal modules, and computational time comparisons demonstrate the model’s suitability for real-time applications. These results validate the practical potential of the DMST-Transformer for robust effluent quality monitoring in wastewater treatment plants. Future research will focus on scaling the model to larger and more diverse datasets, extending it to predict additional water quality indicators, and deploying it in real-time environmental monitoring systems to support intelligent water resource management. Full article
(This article belongs to the Special Issue Advances in Machine Learning and Symmetry/Asymmetry)
Show Figures

Figure 1

12 pages, 493 KB  
Article
AFJ-PoseNet: Enhancing Simple Baselines with Attention-Guided Fusion and Joint-Aware Positional Encoding
by Wenhui Zhang, Yu Shi and Jiayi Lin
Electronics 2025, 14(15), 3150; https://doi.org/10.3390/electronics14153150 - 7 Aug 2025
Viewed by 544
Abstract
Simple Baseline has become a dominant benchmark in human pose estimation (HPE) due to its excellent performance and simple design. However, its “strong encoder + simple decoder” architectural paradigm suffers from two core limitations: (1) its non-branching, linear deconvolutional path prevents it from [...] Read more.
Simple Baseline has become a dominant benchmark in human pose estimation (HPE) due to its excellent performance and simple design. However, its “strong encoder + simple decoder” architectural paradigm suffers from two core limitations: (1) its non-branching, linear deconvolutional path prevents it from leveraging the rich, fine-grained features generated by the encoder at multiple scales and (2) the model lacks explicit prior knowledge of both the absolute positions and structural layout of human keypoints. To address these issues, this paper introduces AFJ-PoseNet, a new architecture that deeply enhances the Simple Baseline framework. First, we restructure Simple Baseline’s original linear decoder into a U-Net-like multi-scale fusion path, introducing intermediate features from the encoder via skip connections. For efficient fusion, we design a novel Attention Fusion Module (AFM), which dynamically gates the flow of incoming detailed features through a context-aware spatial attention mechanism. Second, we propose the Joint-Aware Positional Encoding (JAPE) module, which innovatively combines a fixed global coordinate system with learnable, joint-specific spatial priors. This design injects both absolute position awareness and statistical priors of the human body structure. Our ablation studies on the MPII dataset validate the effectiveness of each proposed enhancement, with our full model achieving a mean PCKh of 88.915, a 0.341 percentage point improvement over our re-implemented baseline. On the more challenging COCO val2017 dataset, our ResNet-50-based AFJ-PoseNet achieves an Average Precision (AP) of 72.6%. While this involves a slight trade-off in Average Recall for higher precision, this result represents a significant 2.2 percentage point improvement over our re-implemented baseline (70.4%) and also outperforms other strong, publicly available models like DARK (72.4%) and SimCC (72.1%) under comparable settings, demonstrating the superiority and competitiveness of our proposed enhancements. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

21 pages, 5917 KB  
Article
VML-UNet: Fusing Vision Mamba and Lightweight Attention Mechanism for Skin Lesion Segmentation
by Tang Tang, Haihui Wang, Qiang Rao, Ke Zuo and Wen Gan
Electronics 2025, 14(14), 2866; https://doi.org/10.3390/electronics14142866 - 17 Jul 2025
Viewed by 2136
Abstract
Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks [...] Read more.
Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks is crucial for accurate lesion localization and optimized clinical workflows. We propose the VML-UNet, a lightweight segmentation network with core innovations including the CPMamba module and the multi-scale local supervision module (MLSM). The CPMamba module integrates the visual state space (VSS) block and a channel prior attention mechanism to enable efficient modeling of spatial relationships with linear computational complexity through dynamic channel-space weight allocation, while preserving channel feature integrity. The MLSM enhances local feature perception and reduces the inference burden. Comparative experiments were conducted on three public datasets, including ISIC2017, ISIC2018, and PH2, with ablation experiments performed on ISIC2017. VML-UNet achieves 0.53 M parameters, 2.18 MB memory usage, and 1.24 GFLOPs time complexity, with its performance on the datasets outperforming comparative networks, validating its effectiveness. This study provides valuable references for developing lightweight, high-performance skin lesion segmentation networks, advancing the field of skin lesion segmentation. Full article
(This article belongs to the Section Bioelectronics)
Show Figures

Figure 1

Back to TopTop