Saved Queries

by Sri Harsha Boppana, Aditya Chandrashekar, Gautam Maddineni, Raja Chandra Chakinala, Ritwik Raj, Rohin B. Shivaprakash, Pradeep Yarra, Venkata C. K. Sunkesula and C. David Mintz

J. Clin. Med. 2026, 15(7), 2484; https://doi.org/10.3390/jcm15072484 (registering DOI) - 24 Mar 2026

Abstract

Hepatocellular carcinoma (HCC) remains a major cause of cancer-related mortality worldwide, and its management is limited by heterogeneous risk profiles, suboptimal surveillance performance, diagnostic uncertainty in chronically diseased livers, and difficulty individualizing prognosis after treatment. The aim of this narrative review was to critically evaluate artificial intelligence (AI) applications across the HCC care continuum, with emphasis on their intended clinical role, reported performance, evidence maturity, and barriers to implementation. A major strength of this review is that it moves beyond a descriptive catalog of models by structuring the literature around clinically relevant decision points and by explicitly distinguishing emerging proof-of-concept tools from applications with stronger translational potential. Across risk stratification, surveillance, imaging-based diagnosis, pathology, treatment-response prediction, and prognostication, we found that AI consistently demonstrates promise, particularly for identifying patients at higher future HCC risk, improving lesion detection and characterization on ultrasound, CT, MRI, and contrast-enhanced ultrasound, assisting histopathologic classification, and predicting outcomes such as microvascular invasion, recurrence, survival, and response to locoregional therapies. However, we also found that the evidence base remains highly uneven: many diagnostic studies are retrospective and lesion-enriched rather than embedded in true surveillance populations, many prognostic models lack robust external validation and calibration assessment, and reference standards, imaging protocols, and dataset composition vary substantially across studies. These findings are clinically relevant because they highlight both where AI may offer near-term value and why most published systems are not yet ready for routine use. Overall, AI in HCC should be viewed as a rapidly evolving but still transitional field. Its future impact will depend not only on higher-performing algorithms but on clearly defined clinical use cases, multicenter and prospective validation, transparent reporting, workflow-aware evaluation, and implementation strategies that support safe, equitable, and scalable adoption. Full article

(This article belongs to the Special Issue The Future Direction of Medical Imaging in Hepato-Bilio-Pancreatic Oncology: Challenges and Opportunities)

►▼ Show Figures

Figure 1

22 pages, 4545 KB

Open AccessArticle

An Interpretable Hybrid SFNet Deep Learning Framework for Multi-Site Bone Fracture Detection in Medical Imaging

by Wejdan S. Aljibreen, Da’ad Albhadel, Shuaa S. Alharbi, Naif S. Alshammari and Haifa F. Alhasson

Diagnostics 2026, 16(7), 966; https://doi.org/10.3390/diagnostics16070966 (registering DOI) - 24 Mar 2026

Abstract

Background/Objectives: Accurate bone fracture detection is essential for orthopedic diagnosis and trauma management. Manual interpretation of X-ray or CT images can be time-consuming and may lead to inter-observer variability, particularly in subtle or multi-site fracture cases. This study proposes an interpretable Hybrid Selective Feature Network (Hybrid SFNet) to improve multi-site bone fracture detection performance and boundary localization. Methods: The proposed Hybrid SFNet extends the original SFNet architecture by incorporating multi-scale convolutional feature extraction and a semantic flow mechanism to enhance structural representation and fracture boundary delineation. Preprocessing techniques, including Canny edge detection, normalization, and data augmentation, were applied to improve feature quality. Model interpretability was addressed using Gradient-weighted Class Activation Mapping (Grad-CAM) to visualize regions contributing to predictions. The model was evaluated on publicly available multi-site fracture datasets using both standard and class-weighted loss configurations. Results: For binary fracture classification, the proposed model achieved 90 accuracy, 94% precision, 77% recall, and an F1-score of 85% for fractured cases. When class-weighted loss was applied, recall improved to 85%, reducing false negatives from 145 to 94 cases (approximately 35%). Under the weighted configuration, Cohen’s Kappa reached 0.79 and the Matthews Correlation Coefficient (MCC) reached 0.76. Conclusions: The proposed Hybrid SFNet provides an interpretable and effective framework for multi-site bone fracture detection. The integration of multi-scale feature extraction and semantic flow mechanisms enhances detection performance and boundary localization, while Grad-CAM supports clinical interpretability. These results indicate the model’s potential for supporting clinical decision-making in orthopedic imaging. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

►▼ Show Figures

Figure 1

20 pages, 3073 KB

Open AccessArticle

YOLOv11-WFD: A Multimodal Grape Segmentation Framework with Wavelet Convolution, FasterNeXt, and Dynamic Upsampling for Intelligent Harvesting

by Pengyan Wang, Chengshuai Li and Linjing Wei

Agronomy 2026, 16(7), 679; https://doi.org/10.3390/agronomy16070679 (registering DOI) - 24 Mar 2026

Abstract

Grapes are high-value crops, but expanding cultivation has made manual harvesting inefficient and costly due to labor shortages and weather constraints. Automated harvesting requires accurate and lightweight image segmentation to ensure reliable visual perception. Improving segmentation precision, robustness, and model compactness is thus critical for intelligent grape harvesting. To enhance segmentation robustness in complex orchard environments, this study introduces a multimodal fusion and multi-scale enhancement strategy and develops a lightweight instance segmentation network. Using a multimodal grape dataset containing RGB, near-infrared (NIR), and depth information, a multi-resolution training scheme based on an image-pyramid framework was constructed. Among the three YOLOv11-based fusion strategies, early fusion achieved the best performance. Accordingly, the lightweight model YOLOv11-WFD was designed by integrating FasterNeXt, DySample, and WaveletPool to strengthen feature extraction, adaptive sampling, and small-object perception. The model delivers high segmentation accuracy and strong deployment suitability for intelligent harvesting applications. Experimental results show that YOLOv11-WFD achieves a mAP@50:95 of 79.3% on the validation set with only 2.25 M parameters, demonstrating outstanding performance in both precision and compactness. Compared with YOLOv3-tiny, YOLOv5n, YOLOv8n, YOLOv10n, YOLOv11n, and YOLOv12n, YOLOv11-WFD improves mAP@50:95 by 25.4, 3.0, 2.7, 2.8, 2.0, and 3.1 percentage points, respectively, while reducing parameters by 80.4%, 7.8%, 23.5%, 10.7%, 20.8%, and 18.8%. Overall, YOLOv11-WFD achieves an excellent balance among accuracy, speed, and complexity, verifying the effectiveness of the multimodal fusion and lightweight integration strategy. It shows strong potential for practical applications and large-scale deployment in complex agricultural environments such as intelligent grape harvesting. Full article

(This article belongs to the Section Precision and Digital Agriculture)

►▼ Show Figures

Figure 1

16 pages, 53570 KB

Open AccessArticle

A Multimodal In-Ear Audio and Physiological Dataset for Swallowing and Non-Verbal Event Classification

by Elyes Ben Cheikh, Yassine Mrabet, Catherine Laporte and Rachel E. Bouserhal

Sensors 2026, 26(7), 2019; https://doi.org/10.3390/s26072019 (registering DOI) - 24 Mar 2026

Abstract

Swallowing is a critical marker of neurological and emotional health. The ability to monitor it continuously and non-invasively, especially through smart ear-worn devices, holds significant promise for clinical applications. Despite this potential, no public audio datasets currently support reliable swallowing sound detection. Existing datasets focus primarily on speech and breathing, offering limited coverage and lacking detailed annotations for swallowing events. To address this gap, we introduce an in-ear audio dataset specifically designed to capture a wide range of verbal and non-verbal sounds. It includes comprehensive labeling focused on swallowing. The dataset was collected from 34 healthy adults (14 females and 20 males) between the ages of 20 and 29. Each participant performed a series of predefined tasks involving both non-verbal and verbal events. Non-verbal tasks included swallowing, clicking, forceful blinking, touching the scalp, and physical movements such as squatting or walking in place. Verbal tasks consisted of speaking (e.g., describing an image). Recordings were conducted in both quiet and noisy environments to better reflect real-world conditions. Data were captured using a combination of in-/outer-ear microphones, a chest belt to record electrocardiogram (ECG), respiration and acceleration signals, and an ultrasound probe to track tongue movement, which served as a reference for swallowing annotation. All signals were precisely synchronized. To ensure high data quality, the recordings were reviewed using both algorithmic analysis and manual inspection. Swallowing events were identified based on ultrasound signals and validated by an expert to guarantee accurate labeling. As a proof of concept that in-ear audio supports swallow classification, we fine-tune a fully connected neural network on YAMNet embeddings plus zero-crossing rate (ZCR) features. Across the completed folds, the model reaches an F1 score of 0.875 ± 0.013. Full article

(This article belongs to the Special Issue Sensors for Physiological Monitoring and Digital Health: 2nd Edition)

►▼ Show Figures

Figure 1

38 pages, 5379 KB

Open AccessReview

A Scoping Review of Automated Calving Front Detection in Satellite Images and Calving Front Position Datasets

by Wojciech Milczarek, Marek Sompolski, Michał Tympalski and Anna Kopeć

Remote Sens. 2026, 18(7), 969; https://doi.org/10.3390/rs18070969 (registering DOI) - 24 Mar 2026

Abstract

Calving front position is a key indicator of glacier and ice-sheet dynamics and an important variable for assessing mass loss and sea-level rise. Rapid growth in satellite data availability and image analysis techniques has driven the development of numerous automated calving front detection algorithms; however, the methodological landscape remains fragmented. This scoping review aims to map the existing literature on automated calving front detection, characterize the types of algorithms and data sources used, and identify trends, gaps, and challenges in current approaches. A systematic search of major bibliographic databases and complementary sources was conducted to identify studies describing automated or semi-automated calving front detection from satellite imagery or derived datasets. Eligible studies included peer-reviewed articles and relevant grey literature using optical, synthetic aperture radar (SAR), or multi-sensor data. Data were charted using a predefined framework that captures the algorithmic approach, input data characteristics, spatial and temporal coverage, validation strategies, and reported performance metrics. The review identifies a wide range of methods, from early threshold- and edge-based techniques to recent machine learning and deep learning approaches, with a strong shift toward convolutional neural networks over the past few years. Despite methodological progress, validation practices and evaluation metrics remain heterogeneous, and standardized benchmark datasets are scarce. This scoping review provides a structured overview of the field and highlights priorities for future methodological development and benchmarking. Full article

(This article belongs to the Special Issue AI, Large Language Models, and Remote Sensing for Disaster Monitoring)

►▼ Show Figures

Figure 1

2 pages, 135 KB

Open AccessComment

Comment on Rastogi et al. Brain Tumor Detection and Prediction in MRI Images Utilizing a Fine-Tuned Transfer Learning Model Integrated Within Deep Learning Frameworks. Life 2025, 15, 327

by Emmanuel Pio Pastore

Life 2026, 16(4), 535; https://doi.org/10.3390/life16040535 (registering DOI) - 24 Mar 2026

Abstract

Rastogi et al. evaluated fine-tuned transfer-learning architectures for brain tumor classification using MRI, reporting the best performance with Xception on a Kaggle-sourced dataset of tumor and non-tumor images [...] Full article

(This article belongs to the Special Issue Artificial Intelligence and Computational Models in Understanding Human Diseases)

27 pages, 10703 KB

Open AccessArticle

WE-KAN: SAR Image Rotated Object Detection Method Based on Wavelet Domain Feature Enhancement and KAN Prediction Head

by Mingchun Li, Yang Liu, Qiang Wang and Dali Chen

Sensors 2026, 26(7), 2011; https://doi.org/10.3390/s26072011 (registering DOI) - 24 Mar 2026

Abstract

Synthetic aperture radar (SAR) imagery plays a vital role in critical applications such as military reconnaissance and disaster monitoring. These applications require high detection accuracy. Therefore, rotated object detection has gained increasing attention. By predicting an object orientation angle, it offers advantages over horizontal bounding boxes, especially for elongated structures such as ships and bridges in SAR scenes. However, challenges such as speckle noise and complex backgrounds in SAR imagery still hinder high-precision detection. To address this, we propose WE-KAN, a novel rotated object detection framework based on wavelet features and Kolmogorov–Arnold network (KAN) prediction. First, we enhance the backbone by incorporating wavelet domain features from SAR grayscale images. The extracted wavelet domain features and image features are fused by a proposed attention module. Second, considering the sensitivity to angle prediction, we design a angle predictor based on KAN. This architecture provides a powerful and dedicated solution for accurate angle regression. Finally, for precise rotated bounding box regression, we employ a joint loss function combining a rotated intersection over union (RIoU) with a Gaussian distance loss function. These designs improve the model’s robustness to noise and its perception of fine object structures. When evaluated on the large-scale public RSAR dataset, our method achieves an AP50 of 70.1 and a mAP of 35.9 under the same training schedule and backbone network, significantly outperforming existing baselines. This demonstrates the effectiveness and robustness of our method for dense, small, and highly oriented objects in complex SAR scenes. Full article

(This article belongs to the Section Sensing and Imaging)

►▼ Show Figures

Figure 1

25 pages, 8786 KB

Open AccessArticle

YOLO11-MSCA: A Multi-Scale Channel Attention Model for Lumbar Vertebra Detection in X-Ray Images

by Hana Ben Fredj, Hatem Garrab and Chokri Souani

Electronics 2026, 15(7), 1341; https://doi.org/10.3390/electronics15071341 - 24 Mar 2026

Abstract

Automated identification of lumbar vertebrae plays a key role in modern spine analysis, offering valuable assistance for diagnostic assessment and preoperative decision-making. Despite recent progress in deep learning-based detection methods, accurately localizing vertebral structures remains challenging due to anatomical variability and heterogeneous image quality. To address the difficulty of capturing subtle vertebral structures, we introduce a Multi-Scale Channel Attention Block (MSCABlock) integrated into the YOLO11 backbone. Unlike conventional attention-based or multi-scale convolutional designs, MSCABlock jointly exploits channel-wise feature interaction and multi-scale receptive fields to enhance both local detail sensitivity and contextual representation, while preserving computational efficiency. The proposed approach is designed to improve detection performance without significantly increasing model complexity. Our model is trained and validated using only the AP-view images from the Burapha University Lumbar-Spine Dataset (BUU-LSPINE), which provides well-annotated lumbar spine X-ray images from 400 unique patients. The proposed approach operates in a fully end-to-end manner, allowing vertebrae to be identified directly from input images without relying on handcrafted feature engineering or complex preprocessing pipelines. Experimental evaluations show that the proposed model achieves strong detection performance, with mAP@0.5 and mAP@0.5–0.95 reaching 0.982 and 0.79, respectively, alongside a precision of 0.93 and a recall of 0.975. Compared with the YOLO11 baseline, ablation and efficiency analyses demonstrate that MSCABlock consistently improves detection performance. It introduces only marginal increases in model parameters and computational cost, thereby preserving a lightweight architecture and maintaining efficient inference. These results show that the optimized YOLO11-based system generalizes well across lumbar levels. It maintains reliable detection under challenging conditions, providing robust automated localization to support large-scale clinical spine analysis. Full article

►▼ Show Figures

Figure 1

18 pages, 6071 KB

Open AccessArticle

DFENet: A Novel Dual-Path Feature Extraction Network for Semantic Segmentation of Remote Sensing Images

by Li Cao, Zishang Liu, Yan Wang and Run Gao

J. Imaging 2026, 12(3), 141; https://doi.org/10.3390/jimaging12030141 - 23 Mar 2026

Abstract

Semantic segmentation of remote sensing images (RSIs) is a fundamental task in geoscience research. However, designing efficient feature fusion modules remains challenging for existing dual-branch or multi-branch architectures. Furthermore, existing deep learning-based architectures predominantly concentrate on spatial feature modeling and context capturing while inherently neglecting the exploration and utilization of critical frequency-domain features, which is crucial for addressing issues of semantic confusion and blurred boundaries in complex remote sensing scenes. To address the challenges of feature fusion and the lack of frequency-domain information, we propose a novel dual-path feature extraction network (DFENet) in this paper. Specifically, a dual-path module (DPM) is developed in DFENet to extract global and local features, respectively. In the global path, after applying the channel splitting strategy, four feature extraction strategies are innovatively integrated to extract global features from different granularities. According to the strategy of supplementing frequency-domain information, a frequency-domain feature extraction block (FFEB) dominated by discrete Wavelet transform (DWT) is designed to effectively captures both high- and low-frequency components. Experimental results show that our method outperforms existing state-of-the-art methods in terms of segmentation performance, achieving a mean intersection over union (mIoU) of 83.09% on the ISPRS Vaihingen dataset and 86.05% on the ISPRS Potsdam dataset. Full article

(This article belongs to the Section Image and Video Processing)

►▼ Show Figures

Figure 1

20 pages, 8955 KB

Open AccessArticle

Language-Guided Contrastive Learning and Difference Enhancement for Semantic Change Detection in Remote Sensing Images

by Yongli Hu, Lintian Ren, Huajie Jiang, Kan Guo, Tengfei Liu, Junbin Gao, Yanfeng Sun and Baocai Yin

Remote Sens. 2026, 18(6), 964; https://doi.org/10.3390/rs18060964 - 23 Mar 2026

Abstract

Semantic change detection (SCD) in remote sensing images aims not only to localize changed regions but also to identify their specific “from–to” semantic transitions. This task remains challenging due to the inherent semantic ambiguity of spectral changes and the presence of pseudo-change noise. While recent vision–language models have shown promise in remote sensing, existing approaches like RemoteCLIP predominantly focus on static scene classification, lacking the ability to explicitly model dynamic temporal transitions. Other adaptations of foundation models (e.g., AdaptVFMs-RSCD) often rely on heavy backbones, incurring prohibitive computational costs. To address these limitations, this paper proposes LGDENet, a lightweight, end-to-end framework that unifies Language-Guided Temporal Contrastive Learning with a noise-robust difference enhancement mechanism. Specifically, we construct a temporal transition prompt learning strategy that aligns visual difference features with textual descriptions of dynamic processes, thereby resolving directional semantic ambiguities. Furthermore, we introduce a Difference Enhancement Module (DEM) that leverages the channel–spatial decoupling property of depthwise separable convolutions to adaptively isolate and suppress irrelevant variations (e.g., registration errors) before feature fusion. Experiments on the SECOND and Landsat-SCD datasets demonstrate that LGDENet achieves state-of-the-art performance, yielding a semantic F1 score (

F_{s c d}

) of 87.90% and 88.71%, respectively. Moreover, with a modest parameter count of 33.45 M, it offers a superior trade-off between accuracy and efficiency compared to heavy foundation model-based approaches. Full article

►▼ Show Figures

Figure 1

30 pages, 2355 KB

Open AccessArticle

SGCAD: A SAR-Guided Confidence-Gated Distillation Framework of Optical and SAR Images for Water-Enhanced Land-Cover Semantic Segmentation

by Junjie Ma, Zhiyi Wang, Yanyi Yuan and Fengming Hu

Remote Sens. 2026, 18(6), 962; https://doi.org/10.3390/rs18060962 - 23 Mar 2026

Abstract

Multimodal fusion of synthetic aperture radar (SAR) and optical imagery is widely used in Earth observation for applications such as land-cover mapping and surface-water mapping (including post-event flood mapping under near-synchronous acquisitions) and land-use inventory. Optical images provide rich spectral and texture cues, whereas SAR offers all-weather structural information that is complementary but heterogeneous. In practice, this heterogeneity often introduces fusion conflicts in multi-class segmentation, causing critical categories such as water bodies to be under-optimized. To address this issue, this paper presents a SAR-guided class-aware knowledge distillation (SGCAD) method for multimodal semantic segmentation. First, a SAR-only HRNet is trained as a water-expert teacher to learn discriminative backscattering and boundary priors for water extraction. Second, a lightweight multimodal student model (LightMCANet) is optimized using a class-aware distillation strategy that transfers teacher knowledge only within high-confidence water regions, thereby suppressing noisy supervision and reducing interference to other classes. Third, a SAR edge guidance module (SEGM) is introduced in the decoder to enhance boundary continuity for slender structures such as water bodies and roads. Overall, SGCAD improves targeted category learning while maintaining stable performance across the remaining classes. Experiments on a self-built dataset from GF-1 optical and LuTan-1 SAR imagery demonstrate higher overall accuracy and more coherent water/road predictions than representative baselines. Future work will extend the proposed distillation scheme to additional categories and broader geographic scenes. Full article

(This article belongs to the Section Remote Sensing Image Processing)

28 pages, 2584 KB

Open AccessFeature PaperArticle

Improving Cross-Domain Generalization in Brain MRIs via Feature Space Stability Regularization

by Shawon Chakrabarty Kakon, Harishik Dev Singh Jamwal and Saurabh Singh

Mathematics 2026, 14(6), 1082; https://doi.org/10.3390/math14061082 - 23 Mar 2026

Abstract

Deep learning models for brain tumor classification from magnetic resonance imaging (MRI) often achieve high in-dataset accuracy but exhibit substantial performance degradation when evaluated on unseen clinical data due to domain shift arising from variations in imaging protocols and intensity distributions. Existing approaches largely rely on architectural scaling or parameter-level regularization, which do not explicitly constrain the stability of learned feature representations. This manuscript proposes Feature Space Stability Regularization (FSSR), a lightweight and model-agnostic training framework that enforces consistency in latent feature representations under realistic, MRI-safe-intensity perturbations. FSSR introduces an auxiliary feature space loss that minimizes the

ℓ_{2}

distance between normalized embeddings extracted from the input MRI images and their intensity-perturbed counterparts, alongside standard cross-entropy supervision. This manuscript evaluated FSSR across three convolutional backbones, ResNet-18, ResNet-34, and DenseNet-121, trained exclusively on the Kaggle Brain MRI dataset. Feature space analysis demonstrates that FSSR consistently reduces mean feature deviation and variance across architectures, indicating more stable internal representations. Generalization is assessed via zero-shot evaluation on the fully unseen BRISC-2025 dataset without retraining or fine-tuning. On the source domain, the best-performing configuration achieves 97.71% accuracy and 97.55% macro-F1. Under domain shift, FSSR improves external accuracy by up to 8.20 percentage points and the macro-F1 by up to 12.50 percentage points, with DenseNet-121 achieving a 96.70% accuracy and 96.87% macro-F1 at a domain gap of only 0.94%. Confusion matrix analysis further reveals the reduced class confusion and more stable recall across challenging tumor categories, demonstrating that feature-level stability is a key factor for robust brain MRI classification under domain shift. Full article

(This article belongs to the Special Issue Recent Advances in Machine Learning Methods for Medical Imaging Analysis)

►▼ Show Figures

Figure 1

23 pages, 19305 KB

Open AccessArticle

Debiased Multiplex Tokenization Using Mamba-Based Pointers for Efficient and Versatile Map-Free Visual Relocalization

by Wenshuai Wang, Hong Liu, Shengquan Li, Peifeng Jiang, Dandan Che and Runwei Ding

Mach. Learn. Knowl. Extr. 2026, 8(3), 83; https://doi.org/10.3390/make8030083 - 23 Mar 2026

Abstract

Visual localization plays a critical role for mobile robots to estimate their position and orientation in GPS-denied environments. However, its efficiency, robustness, and generalization are fundamentally undermined by severe viewpoint changes and dramatic appearance variations, which present persistent challenges for image-based feature representation and pose estimation under real-world conditions. Recently, map-free visual relocalization (MFVR) has emerged as a promising paradigm for lightweight deployment and privacy isolation on edge devices, while how to learn compact and invariant image tokens without relying on structural 3D maps still remains a core problem, particularly in highly dynamic or long-term scenarios. In this paper, we propose the Debiased Multiplex Tokenizer as a novel method (termed as DMT-Loc) for efficient and versatile MFVR to address these issues. Specifically, DMT-Loc is built upon a pretrained vision Mamba encoder and integrates three key modules for relative pose regression: First, Multiplex Interactive Tokenization yields robust image tokens with non-local affinities and cross-domain descriptions. Second, Debiased Anchor Registration facilitates anchor token matching through proximity graph retrieval and autoregressive pointer attribution. Third, Geometry-Informed Pose Regression empowers multi-layer perceptrons with a symmetric swap gating mechanism operating inside each decoupled regression head to support accurate and flexible pose prediction in both pair-wise and multi-view modes. Extensive evaluations across seven public datasets demonstrate that DMT-Loc substantially outperforms existing baselines and ablation variants in diverse indoor and outdoor environments. Full article

►▼ Show Figures

Graphical abstract

23 pages, 1109 KB

Open AccessReview

Strategies for Class-Imbalanced Learning in Multi-Sensor Medical Imaging

by Da Zhou, Song Gao and Xinrui Huang

Sensors 2026, 26(6), 1998; https://doi.org/10.3390/s26061998 - 23 Mar 2026

Abstract

This narrative critical review addresses class imbalance in medical imaging, particularly within the context of multi-sensor and multi-modal environments, poses a critical challenge to developing reliable AI diagnostic systems. The integration of heterogeneous data from sources like CT, MRI, and PET presents a unique opportunity to address data scarcity for rare conditions through fusion techniques. This review provides a structured analysis of strategies to tackle class imbalance, categorizing them into data-centric (e.g., advanced resampling like SMOTE-ENC for mixed data types, GAN-based synthesis) and model-centric (e.g., loss function engineering, transfer learning, and ensemble methods) approaches. Crucially, we highlight how multi-sensor feature fusion and decision-level fusion paradigms can inherently enrich representations for minority classes, offering a powerful frontier beyond single-modality learning. We evaluate each method’s merits, clinical viability, and compliance considerations (e.g., FDA). Finally, we identify emerging trends where imbalance-aware learning synergizes with multi-sensor fusion frameworks, federated learning, and explainable AI, charting a roadmap toward robust, equitable, and clinically deployable diagnostic tools. Our quantitative synthesis shows that data-centric strategies can improve minority class recall by 12–35% in datasets with imbalance ratios (majority:minority) ≥10:1, while model-centric strategies achieve an average AUC improvement of 0.08–0.21 in multi-sensor medical imaging tasks with sample sizes ranging from 50 to 50,000. Full article

(This article belongs to the Special Issue Multi-sensor Fusion in Medical Imaging, Diagnosis and Therapy)

►▼ Show Figures

Figure 1

22 pages, 3088 KB

Open AccessArticle

SLAR-Net: A Hierarchical Network with Spatial and Semantic Fusion for Fashion Attribute Recognition

by Yanxia Jin, Xiaozhu Zhang and Zhuangwei Zhang

Appl. Sci. 2026, 16(6), 3088; https://doi.org/10.3390/app16063088 - 23 Mar 2026

Abstract

With the rapid growth of fashion e-commerce, fashion attribute recognition has emerged as a critical research area in computer vision. Existing methods face two primary problems: (1) building multi-task models, leading to complex network architectures; (2) the overlooking of semantic relationships and spatial positional dependencies between fashion attributes. To address these issues, this paper proposes SLAR-Net, a novel hierarchical multi-label classification network that effectively fuses spatial and semantic information for improved recognition performance. Specifically, SLAR-Net adopts a progressive, hierarchical architecture. Firstly, we introduce a lightweight backbone network enhanced with a custom-designed attention mechanism to extract low-level image features. Secondly, we innovatively construct an adjacency matrix to represent the relative spatial orientations of attributes, which is then employed by a graph convolutional network to model mid-level spatial positional features. Thirdly, we design a graph embedding matrix that captures attribute dependency relationships, leveraging a neural network to learn high-level semantic representations. Finally, we propose a custom multi-head attention mechanism to fuse spatial and semantic features, facilitating enhanced feature interaction and improving recognition performance. Experimental results on fashion attribute and benchmark datasets demonstrate that SLAR-Net outperforms state-of-the-art methods in recognition accuracy, validating the effectiveness of the proposed hierarchical architecture and fusion strategy. Full article

(This article belongs to the Special Issue Advanced Technologies in Image Processing, Analysis, and Machine Vision)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 476.

Go to page 1 2 3 4 5

Search Results (23,760)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI