Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (335)

Search Parameters:
Keywords = fine-grained feature representation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 1623 KB  
Article
AFCLNet: An Attention and Feature-Consistency-Loss-Based Multi-Task Learning Network for Affective Matching Prediction in Music–Video Clips
by Zhibin Su, Jinyu Liu, Luyue Zhang, Yiming Feng and Hui Ren
Sensors 2026, 26(1), 123; https://doi.org/10.3390/s26010123 - 24 Dec 2025
Abstract
Emotion matching prediction between music and video segments is essential for intelligent mobile sensing systems, where multimodal affective cues collected from smart devices must be jointly analyzed for context-aware media understanding. However, traditional approaches relying on single-modality feature extraction struggle to capture complex [...] Read more.
Emotion matching prediction between music and video segments is essential for intelligent mobile sensing systems, where multimodal affective cues collected from smart devices must be jointly analyzed for context-aware media understanding. However, traditional approaches relying on single-modality feature extraction struggle to capture complex cross-modal dependencies, resulting in a gap between low-level audiovisual signals and high-level affective semantics. To address these challenges, a dual-driven framework that integrates perceptual characteristics with objective feature representations is proposed for audiovisual affective matching prediction. The framework incorporates fine-grained affective states of audiovisual data to better characterize cross-modal correlations from an emotional distribution perspective. Moreover, a decoupled Deep Canonical Correlation Analysis approach is developed, incorporating discriminative sample-pairing criteria (matched/mismatched data discrimination) and separate modality-specific component extractors, which dynamically refine the feature projection space. To further enhance multimodal feature interaction, an Attention and Feature-Consistency-Loss-Based Multi-Task Learning Network is proposed. In addition, a feature-consistency loss function is introduced to impose joint constraints across dual semantic embeddings, ensuring both affective consistency and matching accuracy. Experiments on a self-collected benchmark dataset demonstrate that the proposed method achieves a mean absolute error of 0.109 in music–video matching score prediction, significantly outperforming existing approaches. Full article
(This article belongs to the Special Issue Recent Advances in Smart Mobile Sensing Technology)
14 pages, 4385 KB  
Article
MCARSMA: A Multi-Level Cross-Modal Attention Fusion Framework for Accurate RNA–Small Molecule Affinity Prediction
by Ye Li, Yongfeng Zhang, Lei Zhu, Menghua Wang, Rong Wang and Xiao Wang
Mathematics 2026, 14(1), 57; https://doi.org/10.3390/math14010057 - 24 Dec 2025
Abstract
RNA has emerged as a critical drug target, and accurate prediction of its binding affinity with small molecules is essential for the design and screening of RNA-targeted therapeutics. Although current deep learning methods have achieved progress in predicting RNA–small molecule interactions, existing models [...] Read more.
RNA has emerged as a critical drug target, and accurate prediction of its binding affinity with small molecules is essential for the design and screening of RNA-targeted therapeutics. Although current deep learning methods have achieved progress in predicting RNA–small molecule interactions, existing models commonly suffer from reliance on single-modality features and insufficient representation of cross-level interactions. This paper proposes a multi-level cross-modal attention fusion framework, named MCARSMA, which integrates sequence, structural, and semantic information from both RNA and small molecules. The model employs a dual-path interaction mechanism to capture multi-scale relationships spanning from atom–nucleotide fine-grained interactions to global conformational features. The model architecture comprises (1) the feature extraction of RNA secondary structure and sequence using GAT and CNN; (2) small molecule representation that combines GCN and Transformer for joint graph and sequence embedding; (3) a dual-path fusion module for atom–nucleotide fine-grained interactions and structure-guided multi-level interactions; and (4) an adaptive feature weighting mechanism implemented via a gated network. The results demonstrate that on the R-SIM dataset, MCARSMA achieves RMSE = 0.883, PCC = 0.772, and SCC = 0.773, validating the effectiveness of the proposed multi-level cross-modal attention fusion framework. This study provides a highly interpretable deep learning solution with high predictive accuracy. Full article
(This article belongs to the Special Issue Machine Learning Algorithms and Their Applications in Bioinformatics)
Show Figures

Figure 1

27 pages, 25449 KB  
Article
Multi-Domain Feature Fusion Transformer with Cross-Domain Robustness for Facial Expression Recognition
by Katherine Lin Shu and Mu-Jiang-Shan Wang
Symmetry 2026, 18(1), 15; https://doi.org/10.3390/sym18010015 - 21 Dec 2025
Viewed by 68
Abstract
Facial expression recognition (FER) is a key task in affective computing and human–computer interaction, aiming to decode facial muscle movements into emotional categories. Although deep learning-based FER has achieved remarkable progress, robust recognition under uncontrolled conditions (e.g., illumination change, pose variation, occlusion, and [...] Read more.
Facial expression recognition (FER) is a key task in affective computing and human–computer interaction, aiming to decode facial muscle movements into emotional categories. Although deep learning-based FER has achieved remarkable progress, robust recognition under uncontrolled conditions (e.g., illumination change, pose variation, occlusion, and cultural diversity) remains challenging. Traditional Convolutional Neural Networks (CNNs) are effective at local feature extraction but limited in modeling global dependencies, while Vision Transformers (ViT) provide global context modeling yet often neglect fine-grained texture and frequency cues that are critical for subtle expression discrimination. Moreover, existing approaches usually focus on single-domain representations and lack adaptive strategies to integrate heterogeneous cues across spatial, semantic, and spectral domains, leading to limited cross-domain generalization. To address these limitations, this study proposes a unified Multi-Domain Feature Enhancement and Fusion (MDFEFT) framework that combines a ViT-based global encoder with three complementary branches—channel, spatial, and frequency—for comprehensive feature learning. Taking into account the approximately bilateral symmetry of human faces and the asymmetric distortions introduced by pose, occlusion, and illumination, the proposed MDFEFT framework is designed to learn symmetry-aware and asymmetry-robust representations for facial expression recognition across diverse domains. An adaptive Cross-Domain Feature Enhancement and Fusion (CDFEF) module is further introduced to align and integrate heterogeneous features, achieving domain-consistent and illumination-robust expression understanding. The experimental results show that the proposed method consistently outperforms existing CNN-, Transformer-, and ensemble-based models. The proposed model achieves accuracies of 0.997, 0.796, and 0.776 on KDEF, FER2013, and RAF-DB, respectively. Compared with the strongest baselines, it further improves accuracy by 0.3%, 2.2%, and 1.9%, while also providing higher F1-scores and better robustness in cross-domain testing. These results confirm the effectiveness and strong generalization ability of the proposed framework for real-world facial expression recognition. Full article
(This article belongs to the Section Computer)
27 pages, 3305 KB  
Article
SatViT-Seg: A Transformer-Only Lightweight Semantic Segmentation Model for Real-Time Land Cover Mapping of High-Resolution Remote Sensing Imagery on Satellites
by Daoyu Shu, Zhan Zhang, Fang Wan, Wang Ru, Bingnan Yang, Yan Zhang, Jianzhong Lu and Xiaoling Chen
Remote Sens. 2026, 18(1), 1; https://doi.org/10.3390/rs18010001 - 19 Dec 2025
Viewed by 163
Abstract
The demand for real-time land cover mapping from high-resolution remote sensing (HR-RS) imagery motivates lightweight segmentation models running directly on satellites. By processing on-board and transmitting only fine-grained semantic products instead of massive raw imagery, these models provide timely support for disaster response, [...] Read more.
The demand for real-time land cover mapping from high-resolution remote sensing (HR-RS) imagery motivates lightweight segmentation models running directly on satellites. By processing on-board and transmitting only fine-grained semantic products instead of massive raw imagery, these models provide timely support for disaster response, environmental monitoring, and precision agriculture. Many recent methods combine convolutional neural networks (CNNs) with Transformers to balance local and global feature modeling, with convolutions as explicit information aggregation modules. Such heterogeneous hybrids may be unnecessary for lightweight models if similar aggregation can be achieved homogeneously, and operator inconsistency complicates optimization and hinders deployment on resource-constrained satellites. Meanwhile, lightweight Transformer components in these architectures often adopt aggressive channel compression and shallow contextual interaction to meet compute budgets, impairing boundary delineation and recognition of small or rare classes. To address this, we propose SatViT-Seg, a lightweight semantic segmentation model with a pure Vision Transformer (ViT) backbone. Unlike CNN-Transformer hybrids, SatViT-Seg adopts a homogeneous two-module design: a Local-Global Aggregation and Distribution (LGAD) module that uses window self-attention for local modeling and dynamically pooled global tokens with linear attention for long-range interaction, and a Bi-dimensional Attentive Feed-Forward Network (FFN) that enhances representation learning by modulating channel and spatial attention. This unified design overcomes common lightweight ViT issues such as channel compression and weak spatial correlation modeling. SatViT-Seg is implemented and evaluated in LuoJiaNET and PyTorch; comparative experiments with existing methods are run in PyTorch with unified training and data preprocessing for fairness, while the LuoJiaNET implementation highlights deployment-oriented efficiency on a graph-compiled runtime. Compared with the strongest baseline, SatViT-Seg improves mIoU by up to 1.81% while maintaining the lowest FLOPs among all methods. These results indicate that homogeneous Transformers offer strong potential for resource-constrained, on-board real-time land cover mapping in satellite missions. Full article
(This article belongs to the Special Issue Geospatial Artificial Intelligence (GeoAI) in Remote Sensing)
Show Figures

Figure 1

30 pages, 3641 KB  
Article
Modified EfficientNet-B0 Architecture Optimized with Quantum-Behaved Algorithm for Skin Cancer Lesion Assessment
by Abdul Rehman Altaf, Abdullah Altaf and Faizan Ur Rehman
Diagnostics 2025, 15(24), 3245; https://doi.org/10.3390/diagnostics15243245 - 18 Dec 2025
Viewed by 203
Abstract
Background/Objectives: Skin cancer is one of the most common diseases in the world, whose early and accurate detection can have a survival rate more than 90% while the chance of mortality is almost 80% in case of late diagnostics. Methods: A [...] Read more.
Background/Objectives: Skin cancer is one of the most common diseases in the world, whose early and accurate detection can have a survival rate more than 90% while the chance of mortality is almost 80% in case of late diagnostics. Methods: A modified EfficientNet-B0 is developed based on mobile inverted bottleneck convolution with squeeze and excitation approach. The 3 × 3 convolutional layer is used to capture low-level visual features while the core features are extracted using a sequence of Mobile Inverted Bottleneck Convolution blocks having both 3 × 3 and 5 × 5 kernels. They not only balance fine-grained extraction with broader contextual representation but also increase the network’s learning capacity while maintaining computational cost. The proposed architecture hyperparameters and extracted feature vectors of standard benchmark datasets (HAM10000, ISIC 2019 and MSLD v2.0) of dermoscopic images are optimized with the quantum-behaved particle swarm optimization algorithm (QBPSO). The merit function is formulated by the training loss given in the form of standard classification cross-entropy with label smoothing, mean fitness value (mfval), average accuracy (mAcc), mean computational time (mCT) and other standard performance indicators. Results: Comprehensive scenario-based simulations were performed using the proposed framework on a publicly available dataset and found an mAcc of 99.62% and 92.5%, mfval of 2.912 × 10−10 and 1.7921 × 10−8, mCT of 501.431 s and 752.421 s for HAM10000 and ISIC2019 datasets, respectively. The results are compared with state of the art, pre-trained existing models like EfficentNet-B4, RegNetY-320, ResNetXt-101, EfficentNetV2-M, VGG-16, Deep Lab V3 as well as reported techniques based on Mask RCCN, Deep Belief Net, Ensemble CNN, SCDNet and FixMatch-LS techniques having varying accuracies from 85% to 94.8%. The reliability of the proposed architecture and stability of QBPSO is examined through Monte Carlo simulation of 100 independent runs and their statistical soundings. Conclusions: The proposed framework reduces diagnostic errors and assists dermatologists in clinical decisions for an improved patient outcomes despite the challenges like data imbalance and interpretability. Full article
(This article belongs to the Special Issue Medical Image Analysis and Machine Learning)
Show Figures

Figure 1

24 pages, 8304 KB  
Article
STAIR-DETR: A Synergistic Transformer Integrating Statistical Attention and Multi-Scale Dynamics for UAV Small Object Detection
by Linna Hu, Penghao Xue, Bin Guo, Yiwen Chen, Weixian Zha and Jiya Tian
Sensors 2025, 25(24), 7681; https://doi.org/10.3390/s25247681 - 18 Dec 2025
Viewed by 148
Abstract
Detecting small objects in unmanned aerial vehicle (UAV) imagery remains a challenging task due to the limited target scale, cluttered backgrounds, severe occlusion, and motion blur commonly observed in dynamic aerial environments. This study presents STAIR-DETR, a real-time synergistic detection framework derived from [...] Read more.
Detecting small objects in unmanned aerial vehicle (UAV) imagery remains a challenging task due to the limited target scale, cluttered backgrounds, severe occlusion, and motion blur commonly observed in dynamic aerial environments. This study presents STAIR-DETR, a real-time synergistic detection framework derived from RT-DETR, featuring comprehensive enhancements in feature extraction, resolution transformation, and detection head design. A Statistical Feature Attention (SFA) module is incorporated into the neck to replace the original AIFI, enabling token-level statistical modeling that strengthens fine-grained feature representation while effectively suppressing background interference. The backbone is reinforced with a Diverse Semantic Enhancement Block (DSEB), which employs multi-branch pathways and dynamic convolution to enrich semantic expressiveness without sacrificing spatial precision. To mitigate information loss during scale transformation, an Adaptive Scale Transformation Operator (ASTO) is proposed by integrating Context-Guided Downsampling (CGD) and Dynamic Sampling (DySample), achieving context-aware compression and content-adaptive reconstruction across resolutions. In addition, a high-resolution P2 detection head is introduced to leverage shallow-layer features for accurate classification and localization of extremely small targets. Extensive experiments conducted on the VisDrone2019 dataset demonstrate that STAIR-DETR attains 41.7% mAP@50 and 23.4% mAP@50:95, outperforming contemporary state-of-the-art (SOTA) detectors while maintaining real-time inference efficiency. These results confirm the effectiveness and robustness of STAIR-DETR for precise small object detection in complex UAV-based imaging scenarios. Full article
(This article belongs to the Special Issue Dynamics and Control System Design for Robotics)
Show Figures

Figure 1

21 pages, 3813 KB  
Article
HMRM: A Hybrid Motion and Region-Fused Mamba Network for Micro-Expression Recognition
by Zhe Guo, Yi Liu, Rui Luo, Jiayi Liu and Lan Wei
Sensors 2025, 25(24), 7672; https://doi.org/10.3390/s25247672 - 18 Dec 2025
Viewed by 166
Abstract
Micro-expression recognition (MER), as an important branch of intelligent visual sensing, enables the analysis of subtle facial movements for applications in emotion understanding, human–computer interaction and security monitoring. However, existing methods struggle to capture fine-grained spatiotemporal dynamics under limited data and computational resources, [...] Read more.
Micro-expression recognition (MER), as an important branch of intelligent visual sensing, enables the analysis of subtle facial movements for applications in emotion understanding, human–computer interaction and security monitoring. However, existing methods struggle to capture fine-grained spatiotemporal dynamics under limited data and computational resources, making them difficult to deploy in real-world sensing systems. To address this limitation, we propose HMRM, a hybrid motion and region-fused Mamba network designed for efficient and accurate MER. HMRM enhances motion representation through a hybrid feature augmentation module that integrates gated recurrent unit (GRU)-attention optical flow estimation with a regional MotionMix enhancement strategy to increase motion diversity. Furthermore, it employs a grained Mamba encoder to achieve lightweight and effective long-range temporal modeling. Additionally, a regions feature fusion strategy is introduced to strengthen the representation of localized expression dynamics. Experiments on multiple MER benchmark datasets demonstrate that HMRM achieves state-of-the-art performance with strong generalization and low computational cost, highlighting its potential for integration into compact, real-time visual sensing and emotion analysis systems. Full article
(This article belongs to the Special Issue Emotion Recognition and Cognitive Behavior Analysis Based on Sensors)
Show Figures

Figure 1

58 pages, 8484 KB  
Review
Recent Real-Time Aerial Object Detection Approaches, Performance, Optimization, and Efficient Design Trends for Onboard Performance: A Survey
by Nadin Habash, Ahmad Abu Alqumsan and Tao Zhou
Sensors 2025, 25(24), 7563; https://doi.org/10.3390/s25247563 - 12 Dec 2025
Viewed by 810
Abstract
The rising demand for real-time perception in aerial platforms has intensified the need for lightweight, hardware-efficient object detectors capable of reliable onboard operation. This survey provides a focused examination of real-time aerial object detection, emphasizing algorithms designed for edge devices and UAV onboard [...] Read more.
The rising demand for real-time perception in aerial platforms has intensified the need for lightweight, hardware-efficient object detectors capable of reliable onboard operation. This survey provides a focused examination of real-time aerial object detection, emphasizing algorithms designed for edge devices and UAV onboard processors, where computation, memory, and power resources are severely constrained. We first review the major aerial and remote-sensing datasets and analyze the unique challenges they introduce, such as small objects, fine-grained variation, multiscale variation, and complex backgrounds, which directly shape detector design. Recent studies addressing these challenges are then grouped, covering advances in lightweight backbones, fine-grained feature representation, multi-scale fusion, and optimized Transformer modules adapted for embedded environments. The review further highlights hardware-aware optimization techniques, including quantization, pruning, and TensorRT acceleration, as well as emerging trends in automated NAS tailored to UAV constraints. We discuss the adaptation of large pretrained models, such as CLIP-based embeddings and compressed Transformers, to meet onboard real-time requirements. By unifying architectural strategies, model compression, and deployment-level optimization, this survey offers a comprehensive perspective on designing next-generation detectors that achieve both high accuracy and true real-time performance in aerial applications. Full article
(This article belongs to the Special Issue Image Processing and Analysis in Sensor-Based Object Detection)
Show Figures

Figure 1

17 pages, 3108 KB  
Article
A Cross-Scale Spatial–Semantic Feature Aggregation Network for Strip Steel Surface Defect Detection
by Chenglong Xu, Yange Sun, Linlin Huang and Huaping Guo
Materials 2025, 18(24), 5567; https://doi.org/10.3390/ma18245567 - 11 Dec 2025
Viewed by 251
Abstract
Strip steel surface defect detection remains a challenging task due to the diverse scales and uneven spatial distribution of defects, which often lead to incomplete feature representation and missed detections in sparsely distributed regions. To address these challenges, we propose a novel cross-scale [...] Read more.
Strip steel surface defect detection remains a challenging task due to the diverse scales and uneven spatial distribution of defects, which often lead to incomplete feature representation and missed detections in sparsely distributed regions. To address these challenges, we propose a novel cross-scale spatial–semantic feature aggregation network (CSSFAN) that achieves fine-grained and semantically consistent feature fusion across multiple scales. Specifically, CSSFAN adopts a bottom-up feature aggregation strategy equipped with a series of cross-scale spatial–semantic aggregation modules (CSSAMs). Each CSSAM first establishes a mapping relationship between high-level feature points and low-level feature regions and then introduces a cross-scale attention mechanism that adaptively injects spatial details from low-level features into high-level semantic representations. This aggregation strategy bridges the gap between spatial precision and semantic abstraction, enabling the network to capture subtle and irregular defect patterns. Furthermore, we introduce an adaptive region proposal network (ARPN) to cope with the uneven spatial distribution of defects. ARPN dynamically adjusts the number of region proposals according to the local feature complexity, ensuring that regions with dense or subtle defects receive more proposal attention, while sparse or background regions are adaptively suppressed, thereby enhancing the model’s sensitivity to defect-prone areas. Extensive experiments on two strip steel surface defect datasets demonstrate that our method significantly improves detection performance, validating its effectiveness and robustness. Full article
Show Figures

Figure 1

29 pages, 3472 KB  
Review
A Review of Cross-Modal Image–Text Retrieval in Remote Sensing
by Lingxin Xu, Luyao Wang, Jinzhi Zhang, Da Ha and Haisu Zhang
Remote Sens. 2025, 17(24), 3995; https://doi.org/10.3390/rs17243995 - 11 Dec 2025
Viewed by 516
Abstract
With the emergence of large-scale vision-language pre-training (VLP) models, remote sensing (RS) image–text retrieval is shifting from global representation learning to fine-grained semantic alignment. This review systematically examines two mainstream representation paradigms—real-valued embedding and deep hashing—and analyzes how the evolution of RS datasets [...] Read more.
With the emergence of large-scale vision-language pre-training (VLP) models, remote sensing (RS) image–text retrieval is shifting from global representation learning to fine-grained semantic alignment. This review systematically examines two mainstream representation paradigms—real-valued embedding and deep hashing—and analyzes how the evolution of RS datasets influences model capability, including multi-scale robustness, small object discriminability, and temporal semantic understanding. We further dissect three core challenges specific to RS scenarios: multi-scale semantic modeling, small object feature preservation, and multi-temporal reasoning. Representative architectures and technical solutions are reviewed in depth, followed by a critical discussion of their limitations in terms of generalization, evaluation consistency, and reproducibility. We also highlight the growing role of VLP-based models and the dependence of their performance on large-scale, high-quality image–text corpora. Finally, we outline future research directions, including RS-oriented VLP adaptation and unified multi-granularity evaluation frameworks. These insights aim to provide a coherent reference for advancing practical deployment and promoting cross-domain applications of RS image–text retrieval. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

28 pages, 3650 KB  
Article
Gastrointestinal Lesion Detection Using Ensemble Deep Learning Through Global Contextual Information
by Vikrant Aadiwal, Vishesh Tanwar, Bhisham Sharma and Dhirendra Prasad Yadav
Bioengineering 2025, 12(12), 1329; https://doi.org/10.3390/bioengineering12121329 - 5 Dec 2025
Viewed by 440
Abstract
The presence of subtle mucosal abnormalities makes small bowel Crohn’s disease (SBCD) and other gastrointestinal lesions difficult to detect, as these features are often very subtle and can closely resemble other disorders. Although the Kvasir and Esophageal Endoscopy datasets offer high-quality visual representations [...] Read more.
The presence of subtle mucosal abnormalities makes small bowel Crohn’s disease (SBCD) and other gastrointestinal lesions difficult to detect, as these features are often very subtle and can closely resemble other disorders. Although the Kvasir and Esophageal Endoscopy datasets offer high-quality visual representations of various parts of the GI tract, their manual interpretation and analysis by clinicians remain labor-intensive, time-consuming, and prone to subjective variability. To address this, we propose a generalizable ensemble deep learning framework for gastrointestinal lesion detection, capable of identifying pathological patterns such as ulcers, polyps, and esophagitis that visually resemble SBCD-associated abnormalities. Further, the classical convolutional neural network (CNN) extracts shallow high-dimensional features; due to this, it may miss the edges and complex patterns of the gastrointestinal lesions. To mitigate these limitations, this study introduces a deep learning ensemble framework that combines the strengths of EfficientNetB5, MobileNetV2, and multi-head self-attention (MHSA). EfficientNetB5 extracts detailed hierarchical features that help distinguish fine-grained mucosal structures, while MobileNetV2 enhances spatial representation with low computational overhead. The MHSA module further improves the model’s global correlation of the spatial features. We evaluated the model on two publicly available DBE datasets and compared the results with four state-of-the-art methods. Our model achieved classification accuracies of 99.25% and 98.86% on the Kvasir and Kaither datasets. Full article
Show Figures

Figure 1

21 pages, 8629 KB  
Article
Nondestructive Identification of Eggshell Cracks Using Hyperspectral Imaging Combined with Attention-Enhanced 3D-CNN
by Hao Li, Aoyun Zheng, Chaoxian Liu, Jun Huang, Yong Ma, Huanjun Hu and You Du
Foods 2025, 14(24), 4183; https://doi.org/10.3390/foods14244183 - 5 Dec 2025
Viewed by 294
Abstract
Eggshell cracks are a critical factor affecting egg quality and food safety, with traditional detection methods often struggling to detect fine cracks, especially under multi-colored shells and complex backgrounds. To address this issue, we propose a non-destructive detection approach based on an enhanced [...] Read more.
Eggshell cracks are a critical factor affecting egg quality and food safety, with traditional detection methods often struggling to detect fine cracks, especially under multi-colored shells and complex backgrounds. To address this issue, we propose a non-destructive detection approach based on an enhanced three-dimensional convolutional neural network (3D-CNN), named 3D-CrackNet, integrated with hyperspectral imaging (HSI) for high-precision identification and localization of eggshell cracks. Operating within the 1000–2500 nm spectral range, the proposed framework employs spectral preprocessing and optimal band selection to improve discriminative feature representation. A residual learning module is incorporated to mitigate gradient degradation during deep joint spectral-spatial feature extraction, while a parameter-free SimAM attention mechanism adaptively enhances crack-related regions and suppresses background interference. This architecture enables the network to effectively capture both fine-grained spatial textures and contiguous spectral patterns associated with cracks. Experiments on a self-constructed dataset of 400 egg samples show that 3D-CrackNet achieves an F1-score of 75.49% and an Intersection over Union (IoU) of 60.62%, significantly outperforming conventional 1D-CNN and 2D-CNN models. These findings validate that 3D-CrackNet offers a robust, non-destructive, and efficient solution for accurately detecting and localizing subtle eggshell cracks, demonstrating strong potential for intelligent online egg quality grading and micro-defect monitoring in industrial applications. Full article
(This article belongs to the Section Food Analytical Methods)
Show Figures

Figure 1

20 pages, 3620 KB  
Article
EMS-UKAN: An Efficient KAN-Based Segmentation Network for Water Leakage Detection of Subway Tunnel Linings
by Meide He, Lei Tan, Xiaohui Yang, Fei Liu, Zhimin Zhao and Xiaochun Wu
Appl. Sci. 2025, 15(24), 12859; https://doi.org/10.3390/app152412859 - 5 Dec 2025
Viewed by 188
Abstract
Water leakage in subway tunnel linings poses significant risks to structural safety and long-term durability, making accurate and efficient leakage detection a critical task. Existing deep learning methods, such as UNet and its variants, often suffer from large parameter sizes and limited ability [...] Read more.
Water leakage in subway tunnel linings poses significant risks to structural safety and long-term durability, making accurate and efficient leakage detection a critical task. Existing deep learning methods, such as UNet and its variants, often suffer from large parameter sizes and limited ability to capture multi-scale features, which restrict their applicability in real-world tunnel inspection. To address these issues, we propose an Efficient Multi-Scale U-shaped KAN-based Segmentation Network (EMS-UKAN) for detecting water leakage in subway tunnel linings. To reduce computational cost and enable edge-device deployment, the backbone replaces conventional convolutional layers with depthwise separable convolutions, and an Edge-Enhanced Depthwise Separable Convolution Module (EEDM) is incorporated in the decoder to strengthen boundary representation. The PKAN Block is introduced in the bottleneck to enhance nonlinear feature representation and improve the modeling of complex relationships among latent features. In addition, an Adaptive Multi-Scale Feature Extraction Block (AMS Block) is embedded within early skip connections to capture both fine-grained and large-scale leakage features. Extensive experiments on the newly collected Tunnel Water Leakage (TWL) dataset demonstrate that EMS-UKAN outperforms classical models, achieving competitive segmentation performance. In addition, it effectively reduces computational complexity, providing a practical solution for real-world tunnel inspection. Full article
Show Figures

Figure 1

36 pages, 22245 KB  
Article
CMSNet: A SAM-Enhanced CNN–Mamba Framework for Damaged Building Change Detection in Remote Sensing Imagery
by Jianli Zhang, Liwei Tao, Wenbo Wei, Pengfei Ma and Mengdi Shi
Remote Sens. 2025, 17(23), 3913; https://doi.org/10.3390/rs17233913 - 3 Dec 2025
Viewed by 568
Abstract
In war and explosion scenarios, buildings often suffer varying degrees of damage characterized by complex, irregular, and fragmented spatial patterns, posing significant challenges for remote sensing–based change detection. Additionally, the scarcity of high-quality datasets limits the development and generalization of deep learning approaches. [...] Read more.
In war and explosion scenarios, buildings often suffer varying degrees of damage characterized by complex, irregular, and fragmented spatial patterns, posing significant challenges for remote sensing–based change detection. Additionally, the scarcity of high-quality datasets limits the development and generalization of deep learning approaches. To overcome these issues, we propose CMSNet, an end-to-end framework that integrates the structural priors of the Segment Anything Model (SAM) with the efficient temporal modeling and fine-grained representation capabilities of CNN–Mamba. Specifically, CMSNet adopts CNN–Mamba as the backbone to extract multi-scale semantic features from bi-temporal images, while SAM-derived visual priors guide the network to focus on building boundaries and structural variations. A Pre-trained Visual Prior-Guided Feature Fusion Module (PVPF-FM) is introduced to align and fuse these priors with change features, enhancing robustness against local damage, non-rigid deformations, and complex background interference. Furthermore, we construct a new RWSBD (Real-world War Scene Building Damage) dataset based on Gaza war scenes, comprising 42,732 annotated building damage instances across diverse scales, offering a strong benchmark for real-world scenarios. Extensive experiments on RWSBD and three public datasets (CWBD, WHU-CD, and LEVIR-CD+) demonstrate that CMSNet consistently outperforms eight state-of-the-art methods in both quantitative metrics (F1, IoU, Precision, Recall) and qualitative evaluations, especially in fine-grained boundary preservation, small-scale change detection, and complex scene adaptability. Overall, this work introduces a novel detection framework that combines foundation model priors with efficient change modeling, along with a new large-scale war damage dataset, contributing valuable advances to both research and practical applications in remote sensing change detection. Additionally, the strong generalization ability and efficient architecture of CMSNet highlight its potential for scalable deployment and practical use in large-area post-disaster assessment. Full article
Show Figures

Figure 1

19 pages, 2524 KB  
Article
Brain Tumour Classification Model Based on Spatial Block–Residual Block Collaborative Architecture with Strip Pooling Feature Fusion
by Meilan Tang, Xinlian Zhou and Zhiyong Li
J. Imaging 2025, 11(12), 427; https://doi.org/10.3390/jimaging11120427 - 29 Nov 2025
Viewed by 246
Abstract
Precise classification of brain tumors is crucial for early diagnosis and treatment, but obtaining tumor masks is extremely challenging, limiting the application of traditional methods. This paper proposes a brain tumor classification model based on whole-brain images, combining a spatial block–residual block cooperative [...] Read more.
Precise classification of brain tumors is crucial for early diagnosis and treatment, but obtaining tumor masks is extremely challenging, limiting the application of traditional methods. This paper proposes a brain tumor classification model based on whole-brain images, combining a spatial block–residual block cooperative architecture with striped pooling feature fusion to achieve multi-scale feature representation without requiring tumor masks. The model extracts fine-grained morphological features through three shallow VGG spatial blocks while capturing global contextual information between tumors and surrounding tissues via four deep ResNet residual blocks. Residual connections mitigate gradient vanishing. To effectively fuse multi-level features, strip pooling modules are introduced after the third spatial block and fourth residual block, enabling cross-layer feature integration—particularly optimizing representation of irregular tumor regions. The fused features undergo cross-scale concatenation, integrating both spatial perception and semantic information, and are ultimately classified via an end-to-end Softmax classifier. Experimental results demonstrate that the model achieves an accuracy of 97.29% in brain tumor image classification tasks, significantly outperforming traditional convolutional neural networks. This validates its effectiveness in achieving high-precision, multi-scale feature learning and classification without brain tumor masks, holding potential clinical application value. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

Back to TopTop