MDPI - Publisher of Open Access Journals

51 pages, 29535 KB

Open AccessArticle

Evaluating CLAP and MERT for Fine-Grained Cymbal Classification: A Multi-Stage Representation Analysis

by Michael Starakis, Maximos Kaliakatsos-Papakostas and Chrisoula Alexandraki

Electronics 2026, 15(8), 1723; https://doi.org/10.3390/electronics15081723 - 18 Apr 2026

Viewed by 180

This study presents a representation-centric evaluation of audio foundation models for fine-grained musical instrument analysis, focusing on cymbal classification. A confound-aware comparison of CLAP and MERT embeddings is conducted to examine how each latent space supports recoverability of acoustically and semantically relevant information. [...] Read more.

This study presents a representation-centric evaluation of audio foundation models for fine-grained musical instrument analysis, focusing on cymbal classification. A confound-aware comparison of CLAP and MERT embeddings is conducted to examine how each latent space supports recoverability of acoustically and semantically relevant information. To support this analysis, the study introduces a representation-centric, confound-aware multi-stage evaluation framework that separates exploratory geometry, leakage-safe probing, and supporting unsupervised clustering evidence. The methodology is applied to a challenging cymbal dataset characterized by hierarchical labels, class imbalance, and subtle acoustic variation. Results reveal a target-dependent profile of representational strengths rather than a single overall winner. CLAP exhibits stronger variance concentration and more label-consistent local neighborhood organization, and it outperforms MERT on fine-grained, strike-related targets. MERT, however, retains a small but consistent advantage on higher-level cymbal-type classification. Unsupervised analyses show that these advantages reflect local neighborhood structure, not strong global cluster formation, and confound diagnostics indicate that size-related information remains largely type-mediated. Overall, the findings underscore the importance of structured, multi-stage evaluation for disentangling embedding geometry, recoverability, and confound effects while demonstrating the complementary strengths of AFMs in complex audio classification settings. Full article

(This article belongs to the Special Issue Recent Advances in Audio, Speech and Music Processing and Analysis, 2nd Edition)

► Show Figures

Figure 1

15 pages, 3291 KB

Open AccessArticle

Automated Segmentation of Digital Artifacts in Intraoral Photostimulable Phosphor Radiographs

by Ceyda Gizem Topal, Osman Yalçın, Hatice Tetik, Murat Ünal, Necla Bandirmali Erturk and Cemile Özlem Üçok

Diagnostics 2026, 16(8), 1194; https://doi.org/10.3390/diagnostics16081194 - 16 Apr 2026

Viewed by 237

Abstract

Background/Objectives: Intraoral radiographs acquired using photostimulable phosphor (PSP) plates are inherently susceptible to a wide spectrum of artifacts that can compromise diagnostic reliability and lead to unnecessary repeat exposures. Although structured taxonomies describing these artifacts have been proposed, automated methods capable of [...] Read more.

Background/Objectives: Intraoral radiographs acquired using photostimulable phosphor (PSP) plates are inherently susceptible to a wide spectrum of artifacts that can compromise diagnostic reliability and lead to unnecessary repeat exposures. Although structured taxonomies describing these artifacts have been proposed, automated methods capable of detecting and localizing multiple artifact types at the pixel level remain limited, particularly under realistic multi-class conditions. In this study, we address the problem of fine-grained, multi-class PSP artifact segmentation by systematically evaluating a deep learning-based framework and establishing a realistic baseline for this inherently challenging task. Methods: A retrospective, multi-center dataset comprising 1497 intraoral PSP radiographs (bitewing and periapical) collected from three institutions was analyzed. Pixel-level annotations were generated by expert oral and maxillofacial radiologists according to a standardized taxonomy consisting of four major artifact groups and 29 artifact classes, together with a background class. A 2D nnU-Net v2 architecture was employed as a baseline segmentation model. Model development was performed using 5-fold cross-validation, and performance was evaluated on an independent test set using Dice coefficient, Intersection over Union (IoU), Precision, and Recall. Results: Across all classes, the model achieved a mean Dice score of 0.0894 ± 0.0084 in cross-validation and 0.0952 on the independent test set, reflecting the intrinsic complexity of the task. Class-wise analysis revealed substantial variability, with higher performance in larger and visually distinctive artifacts, whereas small-scale, low-contrast, and underrepresented classes exhibited markedly reduced performance. Notably, several artifact categories were absent from the training data, resulting in a zero-shot scenario that directly constrained model generalization. Furthermore, segmentation performance demonstrated a strong dependency on class frequency, measured in terms of pixel distribution, underscoring the impact of severe class imbalance. Group-based evaluation showed relatively higher performance for pre-exposure and exposure-related artifacts compared to post-exposure and scanner-related categories. Conclusions: These findings demonstrate that large-scale, multi-class pixel-level segmentation of PSP artifacts represents a fundamentally challenging problem shaped by the combined effects of class imbalance, small object size, heterogeneous artifact morphology, and incomplete training representation. While the proposed framework confirms the feasibility of automated artifact localization, its current performance suggests greater immediate value as a quality control or screening support tool rather than a fully autonomous diagnostic system. By providing a comprehensive baseline and systematic analysis, this study establishes a benchmark for future research and highlights the critical need for imbalance-aware learning strategies, hierarchical modeling, and data-centric approaches to advance this field. Full article

(This article belongs to the Special Issue Artificial Intelligence in Health Monitoring and Diagnosis: AI Meets Conventional Models—2nd Edition)

► Show Figures

Figure 1

40 pages, 5095 KB

Open AccessArticle

When Lie Groups Meet Hyperspectral Images: Equivariant Manifold Network for Few-Shot HSI Classification

by Haolong Ban, Junchao Feng, Zejin Liu, Yue Jiang, Zhenxing Wang, Jialiang Liu, Yaowen Hu and Yuanshan Lin

Sensors 2026, 26(7), 2117; https://doi.org/10.3390/s26072117 - 29 Mar 2026

Viewed by 443

Abstract

Hyperspectral imagery (HSI) offers rich spectral signatures and fine-grained spatial structures for remote sensing, but practical HSI classification is often constrained by scarce labels and complex geometric disturbances, including translation, rotation, scaling, and shear. Existing deep models are typically developed under Euclidean assumptions [...] Read more.

Hyperspectral imagery (HSI) offers rich spectral signatures and fine-grained spatial structures for remote sensing, but practical HSI classification is often constrained by scarce labels and complex geometric disturbances, including translation, rotation, scaling, and shear. Existing deep models are typically developed under Euclidean assumptions and rely on data-hungry training pipelines, which makes them brittle in the few-shot regime. To address this challenge, we propose EMNet, a Lie-group-based Equivariant Manifold Network for few-shot HSI classification that explicitly encodes geometric invariance and improves discriminative accuracy. EMNet couples an SE(2)-based Equivariance-Guided Module (EGM) to enforce equivariance to translations and rotations with an affine Lie-group-based Characteristic Filtering Convolution (CFC) that models scaling and shearing on the feature manifold while adaptively suppressing redundant responses. Extensive experiments on WHU-Hi-HongHu, Houston2013, and Indian Pines demonstrate state-of-the-art performance with competitive complexity, achieving OAs of 95.77% (50 samples/class), 97.37% (50 samples/class), and 96.09% (5% labeled samples), respectively, and yielding up to +3.34% OA, +6.01% AA, and +4.14% Kappa over the strong DGPF-RENet baseline. Under a stricter 25-samples-per-class protocol with 10 repeated random hold-out splits, EMNet consistently improves the mean accuracy while exhibiting lower variance, indicating better stability to sampling uncertainty. On the city-scale Xiongan New Area dataset with extreme long-tail imbalance (1580 × 3750 pixels, 256 bands, and 5.925 M labeled pixels), EMNet further boosts OA from 85.89% to 93.77% under the 1% labeled-sample protocol, highlighting robust generalization for large-area mapping. Beyond point estimates, we report mean ± SD/SE across repeated splits and provide rigorous statistical validation by computing Yule’s Q statistic for class-wise behavior similarity, performing the Friedman test with Nemenyi post hoc comparisons for multi-method ranking significance, and presenting 95% confidence intervals together with Cohen’s d effect sizes to quantify practical improvement. Full article

(This article belongs to the Special Issue Hyperspectral Sensing: Imaging and Applications)

► Show Figures

Figure 1

17 pages, 1082 KB

Open AccessArticle

AACNN-ViT: Adaptive Attention-Augmented Convolutional and Vision Transformer Fusion for Lung Cancer Detection

by Mohammad Ishtiaque Rahman and Amrina Rahman

J. Imaging 2026, 12(2), 62; https://doi.org/10.3390/jimaging12020062 - 30 Jan 2026

Viewed by 642

Abstract

Lung cancer remains a leading cause of cancer-related mortality. Although reliable multiclass classification of lung lesions from CT imaging is essential for early diagnosis, it remains challenging due to subtle inter-class differences, limited sample sizes, and class imbalance. We propose an Adaptive Attention-Augmented [...] Read more.

Lung cancer remains a leading cause of cancer-related mortality. Although reliable multiclass classification of lung lesions from CT imaging is essential for early diagnosis, it remains challenging due to subtle inter-class differences, limited sample sizes, and class imbalance. We propose an Adaptive Attention-Augmented Convolutional Neural Network with Vision Transformer (AACNN-ViT), a hybrid framework that integrates local convolutional representations with global transformer embeddings through an adaptive attention-based fusion module. The CNN branch captures fine-grained spatial patterns, the ViT branch encodes long-range contextual dependencies, and the adaptive fusion mechanism learns to weight cross-representation interactions to improve discriminability. To reduce the impact of imbalance, a hybrid objective that combines focal loss with categorical cross-entropy is incorporated during training. Experiments on the IQ-OTH/NCCD dataset (benign, malignant, and normal) show consistent performance progression in an ablation-style evaluation: CNN-only, ViT-only, CNN-ViT concatenation, and AACNN-ViT. The proposed AACNN-ViT achieved 96.97% accuracy on the validation set with macro-averaged precision/recall/F1 of 0.9588/0.9352/0.9458 and weighted F1 of 0.9693, substantially improving minority-class recognition (Benign recall 0.8333) compared with CNN-ViT (accuracy 89.09%, macro-F1 0.7680). One-vs.-rest ROC analysis further indicates strong separability across all classes (micro-average AUC 0.992). These results suggest that adaptive attention-based fusion offers a robust and clinically relevant approach for computer-aided lung cancer screening and decision support. Full article

(This article belongs to the Special Issue Progress and Challenges in Biomedical Image Analysis—2nd Edition)

► Show Figures

Figure 1

19 pages, 2136 KB

Open AccessArticle

Transformer-Based Multi-Class Classification of Bangladeshi Rice Varieties Using Image Data

by Israt Tabassum and Vimala Nunavath

Appl. Sci. 2026, 16(3), 1279; https://doi.org/10.3390/app16031279 - 27 Jan 2026

Viewed by 431

Abstract

Rice (Oryza sativa L.) is a staple food for over half of the global population, with significant economic, agricultural, and cultural importance, particularly in Asia. Thousands of rice varieties exist worldwide, differing in size, shape, color, and texture, making accurate classification essential [...] Read more.

Rice (Oryza sativa L.) is a staple food for over half of the global population, with significant economic, agricultural, and cultural importance, particularly in Asia. Thousands of rice varieties exist worldwide, differing in size, shape, color, and texture, making accurate classification essential for quality control, breeding programs, and authenticity verification in trade and research. Traditional manual identification of rice varieties is time-consuming, error-prone, and heavily reliant on expert knowledge. Deep learning provides an efficient alternative by automatically extracting discriminative features from rice grain images for precise classification. While prior studies have primarily employed deep learning models such as CNN, VGG, InceptionV3, MobileNet, and DenseNet201, transformer-based models remain underexplored for rice variety classification. This study addresses this gap by applying two deep learning models such as Swin Transformer and Vision Transformer for multi-class classification of rice varieties using the publicly available PRBD dataset from Bangladesh. Experimental results demonstrate that the ViT model achieved an accuracy of 99.86% with precision, recall, and F1-score all at 0.9986, while the Swin Transformer model obtained an accuracy of 99.44% with a precision of 0.9944, recall of 0.9944, and F1-score of 0.9943. These results highlight the effectiveness of transformer-based models for high-accuracy rice variety classification. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

26 pages, 1838 KB

Open AccessArticle

Artificial Intelligence in Honey Pollen Analysis: Accuracy and Limitations of Pollen Classification Compared with Palynological Expert Assessment

by Joanna Katarzyna Banach, Bartosz Lewandowski and Przemysław Rujna

Appl. Sci. 2025, 15(24), 13009; https://doi.org/10.3390/app152413009 - 10 Dec 2025

Viewed by 903

Abstract

Honey authenticity, including its botanical origin, is traditionally assessed by melissopalynology, a labour-intensive and expert-dependent method. This study reports the final validation of a deep learning model for pollen grain classification in honey, developed within the NUTRITECH.I-004A/22 project, by comparing its performance with [...] Read more.

Honey authenticity, including its botanical origin, is traditionally assessed by melissopalynology, a labour-intensive and expert-dependent method. This study reports the final validation of a deep learning model for pollen grain classification in honey, developed within the NUTRITECH.I-004A/22 project, by comparing its performance with that of an independent palynology expert. A dataset of 5194 pollen images was acquired from five unifloral honeys, rapeseed (Brassica napus), sunflower (Helianthus annuus), buckwheat (Fagopyrum esculentum), phacelia (Phacelia tanacetifolia) and linden (Tilia cordata), under a standardized microscopy protocol and manually annotated using an extended set of morphological descriptors (shape, size, apertures, exine ornamentation and wall thickness). The evaluation involved training and assessing a deep learning model based solely on the ResNet152 architecture with pretrained ImageNet weights. This model was enhanced by adding additional layers: a global average pooling layer, a dense hidden layer with ReLU activation, and a final softmax output layer for multi-class classification. Model performance was assessed using multiclass metrics and agreement with the expert, including Cohen’s kappa. The AI classifier achieved almost perfect agreement with the expert (κ ≈ 0.94), with the highest accuracy for pollen grains exhibiting spiny ornamentation and clearly thin or thick walls, and lower performance for reticulate exine and intermediate wall thickness. Misclassifications were associated with suboptimal image quality and intermediate confidence scores. Compared with traditional melissopalynological assessment (approx. 1–2 h of microscopic analysis per sample), the AI system reduced the effective classification time to less than 2 min per prepared sample under routine laboratory conditions, demonstrating a clear gain in analytical throughput. The results demonstrate that, under routine laboratory conditions, AI-based digital palynology can reliably support expert assessment, provided that imaging is standardized and prediction confidence is incorporated into decision rules for ambiguous cases. Full article

(This article belongs to the Section Food Science and Technology)

► Show Figures

Figure 1

26 pages, 3269 KB

Open AccessArticle

DiagNeXt: A Two-Stage Attention-Guided ConvNeXt Framework for Kidney Pathology Segmentation and Classification

by Hilal Tekin, Şafak Kılıç and Yahya Doğan

J. Imaging 2025, 11(12), 433; https://doi.org/10.3390/jimaging11120433 - 4 Dec 2025

Cited by 2 | Viewed by 873

Abstract

Accurate segmentation and classification of kidney pathologies from medical images remain a major challenge in computer-aided diagnosis due to complex morphological variations, small lesion sizes, and severe class imbalance. This study introduces DiagNeXt, a novel two-stage deep learning framework designed to overcome these [...] Read more.

Accurate segmentation and classification of kidney pathologies from medical images remain a major challenge in computer-aided diagnosis due to complex morphological variations, small lesion sizes, and severe class imbalance. This study introduces DiagNeXt, a novel two-stage deep learning framework designed to overcome these challenges through an integrated use of attention-enhanced ConvNeXt architectures for both segmentation and classification. In the first stage, DiagNeXt-Seg employs a U-Net-based design incorporating Enhanced Convolutional Blocks (ECBs) with spatial attention gates and Atrous Spatial Pyramid Pooling (ASPP) to achieve precise multi-class kidney segmentation. In the second stage, DiagNeXt-Cls utilizes the segmented regions of interest (ROIs) for pathology classification through a hierarchical multi-resolution strategy enhanced by Context-Aware Feature Fusion (CAFF) and Evidential Deep Learning (EDL) for uncertainty estimation. The main contributions of this work include: (1) enhanced ConvNeXt blocks with large-kernel depthwise convolutions optimized for 3D medical imaging, (2) a boundary-aware compound loss combining Dice, cross-entropy, focal, and distance transform terms to improve segmentation precision, (3) attention-guided skip connections preserving fine-grained spatial details, (4) hierarchical multi-scale feature modeling for robust pathology recognition, and (5) a confidence-modulated classification approach integrating segmentation quality metrics for reliable decision-making. Extensive experiments on a large kidney CT dataset comprising 3847 patients demonstrate that DiagNeXt achieves 98.9% classification accuracy, outperforming state-of-the-art approaches by 6.8%. The framework attains near-perfect AUC scores across all pathology classes (Normal: 1.000, Tumor: 1.000, Cyst: 0.999, Stone: 0.994) while offering clinically interpretable uncertainty maps and attention visualizations. The superior diagnostic accuracy, computational efficiency (6.2× faster inference), and interpretability of DiagNeXt make it a strong candidate for real-world integration into clinical kidney disease diagnosis and treatment planning systems. Full article

(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)

► Show Figures

Figure 1

21 pages, 21928 KB

Open AccessArticle

HieraEdgeNet: A Multi-Scale Edge-Enhanced Framework for Automated Pollen Recognition

by Yuchong Long, Wen Sun, Ningxiao Sun, Wenxiao Wang, Chao Li and Shan Yin

Agriculture 2025, 15(23), 2518; https://doi.org/10.3390/agriculture15232518 - 4 Dec 2025

Cited by 2 | Viewed by 781

Abstract

Automated pollen recognition is a foundational tool for diverse scientific domains, including paleoclimatology, biodiversity monitoring, and agricultural science. However, conventional methods create a critical data bottleneck, limiting the temporal and spatial resolution of ecological analysis. Existing deep learning models often fail to achieve [...] Read more.

Automated pollen recognition is a foundational tool for diverse scientific domains, including paleoclimatology, biodiversity monitoring, and agricultural science. However, conventional methods create a critical data bottleneck, limiting the temporal and spatial resolution of ecological analysis. Existing deep learning models often fail to achieve the requisite localization accuracy for microscopic pollen grains, which are characterized by their minute size, indistinct edges, and complex backgrounds. To overcome this, we introduce HieraEdgeNet, a novel object detection framework. The core principle of our architecture is to explicitly extract and hierarchically fuse multi-scale edge information with deep semantic features. This synergistic approach, combined with a computationally efficient large-kernel operator for fine-grained feature refinement, significantly enhances the model’s ability to perceive and precisely delineate object boundaries. On a large-scale dataset comprising 44,471 annotated microscopic images containing 342,706 pollen grains from 120 classes, HieraEdgeNet achieves a mean Average Precision of 0.9501 (mAP@0.5) and 0.8444 (mAP@0.5:0.95), substantially outperforming state-of-the-art models such as YOLOv12n and the Transformer-based RT-DETR family in terms of the accuracy–efficiency trade-off. This work provides a powerful computational tool for generating the high-throughput, high-fidelity data essential for modern ecological research, including tracking phenological shifts, assessing plant biodiversity, and reconstructing paleoenvironments. At the same time, we acknowledge that the current two-dimensional design cannot directly exploit volumetric Z-stack microscopy and that strong domain shifts between training data and real-world deployments may still degrade performance, which we identify as key directions for future work. By also enabling applications in precision agriculture, HieraEdgeNet contributes broadly to advancing ecosystem monitoring and sustainable food security. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

28 pages, 2107 KB

Open AccessArticle

A Scale-Adaptive and Frequency-Aware Attention Network for Precise Detection of Strawberry Diseases

by Kaijie Zhang, Yuchen Ye, Kaihao Chen, Zao Li and Hongxing Peng

Agronomy 2025, 15(8), 1969; https://doi.org/10.3390/agronomy15081969 - 15 Aug 2025

Viewed by 1376

Abstract

Accurate and automated detection of diseases is crucial for sustainable strawberry production. However, the challenges posed by small size, mutual occlusion, and high intra-class variance of symptoms in complex agricultural environments make this difficult. Mainstream deep learning detectors often do not perform well [...] Read more.

Accurate and automated detection of diseases is crucial for sustainable strawberry production. However, the challenges posed by small size, mutual occlusion, and high intra-class variance of symptoms in complex agricultural environments make this difficult. Mainstream deep learning detectors often do not perform well under these demanding conditions. We propose a novel detection framework designed for superior accuracy and robustness to address this critical gap. Our framework introduces four key innovations: First, we propose a novel attention-driven detection head featuring our Parallel Pyramid Attention (PPA) module. Inspired by pyramid attention principles, our module’s unique parallel multi-branch architecture is designed to overcome the limitations of serial processing. It simultaneously integrates global, local, and serial features to generate a fine-grained attention map, significantly improving the model’s focus on targets of varying scales. Second, we enhance the core feature fusion blocks by integrating Monte Carlo Attention (MCAttn), effectively empowering the model to recognize targets across diverse scales. Third, to improve the feature representation capacity of the backbone without increasing the parametric overhead, we replace standard convolutions with Frequency-Dynamic Convolutions (FDConv). This approach constructs highly diverse kernels in the frequency domain. Finally, we employ the Scale-Decoupled Loss function to optimize training dynamics. By adaptively re-weighting the localization and scale losses based on target size, we stabilize the training process and improve the Precision of bounding box regression for small objects. Extensive experiments on a challenging dataset related to strawberry diseases demonstrate that our proposed model achieves a mean Average Precision (MAP) of 81.1%. This represents an improvement of 2.1% over the strong YOLOv12-n baseline, highlighting its practical value as an effective tool for intelligent disease protection. Full article

(This article belongs to the Special Issue Modern Control of Biotic Stress in Crops: Intelligent Detection and Precision Pesticide Application)

► Show Figures

Figure 1

26 pages, 9083 KB

Open AccessArticle

An Efficient Fine-Grained Recognition Method Enhanced by Res2Net Based on Dynamic Sparse Attention

by Qifeng Niu, Hui Wang and Feng Xu

Sensors 2025, 25(13), 4147; https://doi.org/10.3390/s25134147 - 3 Jul 2025

Cited by 1 | Viewed by 1306

Abstract

Fine-grained recognition tasks face significant challenges in differentiating subtle, class-specific details against cluttered backgrounds. This paper presents an efficient architecture built upon the Res2Net backbone, significantly enhanced by a dynamic Sparse Attention mechanism. The core approach leverages the inherent multi-scale representation power of [...] Read more.

Fine-grained recognition tasks face significant challenges in differentiating subtle, class-specific details against cluttered backgrounds. This paper presents an efficient architecture built upon the Res2Net backbone, significantly enhanced by a dynamic Sparse Attention mechanism. The core approach leverages the inherent multi-scale representation power of Res2Net to capture discriminative patterns across different granularities. Crucially, the integrated Sparse Attention module operates dynamically, selectively amplifying the most informative features while attenuating irrelevant background noise and redundant details. This combined strategy substantially improves the model’s ability to focus on pivotal regions critical for accurate classification. Furthermore, strategic architectural optimizations are applied throughout to minimize computational complexity, resulting in a model that demands significantly fewer parameters and exhibits faster inference times. Extensive evaluations on benchmark datasets demonstrate the effectiveness of the proposed method. It achieves a modest but consistent accuracy gain over strong baselines (approximately 2%) while simultaneously reducing model size by around 30% and inference latency by about 20%, proving highly effective for practical fine-grained recognition applications requiring both high accuracy and operational efficiency. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition for Advanced Smart Agriculture Solutions)

► Show Figures

Figure 1

24 pages, 2802 KB

Open AccessArticle

MSDCA: A Multi-Scale Dual-Branch Network with Enhanced Cross-Attention for Hyperspectral Image Classification

by Ning Jiang, Shengling Geng, Yuhui Zheng and Le Sun

Remote Sens. 2025, 17(13), 2198; https://doi.org/10.3390/rs17132198 - 26 Jun 2025

Cited by 2 | Viewed by 1437

Abstract

The high dimensionality of hyperspectral data, coupled with limited labeled samples and complex scene structures, makes spatial–spectral feature learning particularly challenging. To address these limitations, we propose a dual-branch deep learning framework named MSDCA, which performs spatial–spectral joint modeling under limited supervision. First, [...] Read more.

The high dimensionality of hyperspectral data, coupled with limited labeled samples and complex scene structures, makes spatial–spectral feature learning particularly challenging. To address these limitations, we propose a dual-branch deep learning framework named MSDCA, which performs spatial–spectral joint modeling under limited supervision. First, a multiscale 3D spatial–spectral feature extraction module (3D-SSF) employs parallel 3D convolutional branches with diverse kernel sizes and dilation rates, enabling hierarchical modeling of spatial–spectral representations from large-scale patches and effectively capturing both fine-grained textures and global context. Second, a multi-branch directional feature module (MBDFM) enhances the network’s sensitivity to directional patterns and long-range spatial relationships. It achieves this by applying axis-aware depthwise separable convolutions along both horizontal and vertical axes, thereby significantly improving the representation of spatial features. Finally, the enhanced cross-attention Transformer encoder (ECATE) integrates a dual-branch fusion strategy, where a cross-attention stream learns semantic dependencies across multi-scale tokens, and a residual path ensures the preservation of structural integrity. The fused features are further refined through lightweight channel and spatial attention modules. This adaptive alignment process enhances the discriminative power of heterogeneous spatial–spectral features. The experimental results on three widely used benchmark datasets demonstrate that the proposed method consistently outperforms state-of-the-art approaches in terms of classification accuracy and robustness. Notably, the framework is particularly effective for small-sample classes and complex boundary regions, while maintaining high computational efficiency. Full article

► Show Figures

Graphical abstract

22 pages, 2006 KB

Open AccessArticle

Modelling Trace Metals in River and Sediment Compartments to Assess Water Quality

by Aline Grard and Jean-François Deliège

Water 2025, 17(13), 1876; https://doi.org/10.3390/w17131876 - 24 Jun 2025

Viewed by 1530

Abstract

The present study focuses on the dynamics of trace metals (TM) in two European rivers, the Mosel and the Meuse. A deterministic description of hydro-sedimentary processes has been performed. The model used to describe pollutant transport and dilution at the watershed scale has [...] Read more.

The present study focuses on the dynamics of trace metals (TM) in two European rivers, the Mosel and the Meuse. A deterministic description of hydro-sedimentary processes has been performed. The model used to describe pollutant transport and dilution at the watershed scale has been enhanced with the implementation of the MicMod sub-model. The objective of this study is to characterise the dynamics of TM in the water column and bed sediment. A multi-class grain size representation has been developed in MicMod. The dissolved and particulate TM phases have been calculated with specific partitioning coefficients associated with each suspended sediment (SS) class. The processes involved in TM fate have been calibrated in MicMod, including settling velocity, TM releases from the watershed (point and diffuse loads), etc. Following the calibration of the parameters involved in TM transport within the river ecosystem, the main goal is to describe TM dynamics using a pressure–impact relationship model. It was demonstrated that the description of at least one class of fine particles is necessary to obtain an adequate representation of TM concentrations. The focus of this study is low flow periods, which are characterised by the presence of fine particles. The objective is to gain a deeper understanding of the processes that control the transport of TM. This paper establishes consistent pressure–impact relationships between TM loads (urban, industrial, soils) from watersheds and concentrations in rivers. Full article

(This article belongs to the Section Water Quality and Contamination)

► Show Figures

Figure 1

33 pages, 9537 KB

Open AccessArticle

A Deep Learning-Based Solution to the Class Imbalance Problem in High-Resolution Land Cover Classification

by Pengdi Chen, Yong Liu, Yuanrui Ren, Baoan Zhang and Yuan Zhao

Remote Sens. 2025, 17(11), 1845; https://doi.org/10.3390/rs17111845 - 25 May 2025

Cited by 8 | Viewed by 7050

Abstract

Class imbalance (CI) poses a significant challenge in machine learning, characterized by a substantial disparity in sample sizes between majority and minority classes, leading to a pronounced “long-tail effect” in statistical distributions and subsequent inference processes. This issue is particularly acute in high-resolution [...] Read more.

Class imbalance (CI) poses a significant challenge in machine learning, characterized by a substantial disparity in sample sizes between majority and minority classes, leading to a pronounced “long-tail effect” in statistical distributions and subsequent inference processes. This issue is particularly acute in high-resolution land cover classification within arid regions, where CI tends to bias classification outcomes towards majority classes, often at the expense of minority classes. Recent advancements in deep learning have opened new avenues for tackling the CI problem in this context, focusing on three key aspects: the semantic segmentation model, loss function design, and dataset composition. To address this issue, we propose the high-resolution U-shaped mamba network (HRUMamba), which integrates multiple innovations to enhance segmentation performance under imbalanced conditions. Specifically, HRUMamba adopts a pre-trained HRNet as the encoder for capturing fine-grained local features and incorporates a modified scaled visual state space (SVSS) block in the decoder to model long-range dependencies effectively. An adaptive awareness fusion (AAF) module is embedded within the skip connections to enhance target saliency. Additionally, we introduce a synthetic loss function that combines cross-entropy loss, Dice loss, and auxiliary loss to improve optimization stability. To quantitatively assess multi-class imbalance, we introduce the coefficient of variation (CV) as a novel evaluation metric. Experimental results on the ISPRS Vaihingen and Minqin datasets demonstrate the robustness and effectiveness of HRUMamba in mitigating CI. The proposed model achieves the highest mF1 scores of 92.25% and 89.88%, along with the lowest CV values of 0.0445 and 0.0574, respectively, outperforming state-of-the-art methods. These innovations underscore the potential of HRUMamba in advancing high-resolution land cover classification in imbalanced datasets. Full article

► Show Figures

Graphical abstract

23 pages, 9879 KB

Open AccessArticle

Carbon-Halloysite Nanocomposites and Their Adsorption Characteristics for Pharmaceuticals—A Naproxen Case Study

by Piotr Słomkiewicz, Beata Szczepanik, Piotr Sakiewicz, Klaudiusz Gołombek and Krzysztof Piotrowski

Materials 2025, 18(11), 2433; https://doi.org/10.3390/ma18112433 - 22 May 2025

Cited by 1 | Viewed by 1250

Abstract

The synthesis of carbon-halloysite nanocomposites was carried out using aqueous sucrose solutions as a carbon precursor. Raw and calcined halloysite with different grain size classes were used as a carbon support. The influence of halloysite grain size and the calcination process on the [...] Read more.

The synthesis of carbon-halloysite nanocomposites was carried out using aqueous sucrose solutions as a carbon precursor. Raw and calcined halloysite with different grain size classes were used as a carbon support. The influence of halloysite grain size and the calcination process on the carbon concentration in the composites and their adsorption characteristics towards the separation of naproxen from aqueous solutions was identified experimentally. The kinetic conditions of the process (pseudo-second-order kinetic model) indicate a favorable increase in the number of active sites formed after the deposition of the carbon layer on the surface of halloysite particles. Validation of the Langmuir multi-center isotherm adsorption model indicates a separation mechanism associated with the occurrence of multiple active centers on the nanocomposite adsorbent surface and the effect of separation without dissociation of naproxen particles. The obtained carbon-halloysite nanocomposite, due to the relatively cheap and simple, environmentally friendly production methodology and the required inexpensive raw materials, can be widely used in effective and common, economical treatment of wastewater streams from naproxen. The observed naproxen separation process effects are significant. Full article

(This article belongs to the Special Issue Advances in Surface Engineering Technologies and Their Impact on Surface Integrity and Functional Performance of Additively Manufactured Parts)

► Show Figures

Figure 1

21 pages, 9477 KB

Open AccessArticle

M2Former: Multiscale Patch Selection for Fine-Grained Visual Recognition

by Jiyong Moon and Seongsik Park

Appl. Sci. 2024, 14(19), 8710; https://doi.org/10.3390/app14198710 - 26 Sep 2024

Cited by 3 | Viewed by 3082

Abstract

Recently, Vision Transformers (ViTs) have been actively applied to fine-grained visual recognition (FGVR). ViT can effectively model the interdependencies between patch-divided object regions through an inherent self-attention mechanism. In addition, patch selection is used with ViT to remove redundant patch information and highlight [...] Read more.

Recently, Vision Transformers (ViTs) have been actively applied to fine-grained visual recognition (FGVR). ViT can effectively model the interdependencies between patch-divided object regions through an inherent self-attention mechanism. In addition, patch selection is used with ViT to remove redundant patch information and highlight the most discriminative object patches. However, existing ViT-based FGVR models are limited to single-scale processing, and their fixed receptive fields hinder representational richness and exacerbate vulnerability to scale variability. Therefore, we propose MultiScale Patch Selection (MSPS) to improve the multiscale capabilities of existing ViT-based models. Specifically, MSPS selects salient patches of different scales at different stages of a MultiScale Vision Transformer (MS-ViT). In addition, we introduce Class Token Transfer (CTT) and MultiScale Cross-Attention (MSCA) to model cross-scale interactions between selected multiscale patches and fully reflect them in model decisions. Compared with previous Single-Scale Patch Selection (SSPS), our proposed MSPS encourages richer object representations based on feature hierarchy and consistently improves performance from small-sized to large-sized objects. As a result, we propose M2Former, which outperforms CNN-/ViT-based models on several widely used FGVR benchmarks. Full article

► Show Figures

Figure 1

Search Results (26)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (26)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI