Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (450)

Search Parameters:
Keywords = annotation efficient deep learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 12952 KB  
Article
Synthetic Melanoma Image Generation and Evaluation Using Generative Adversarial Networks
by Pei-Yu Lin, Yidan Shen, Neville Mathew, Renjie Hu, Siyu Huang, Courtney M. Queen, Cameron E. West, Ana Ciurea and George Zouridakis
Bioengineering 2026, 13(2), 245; https://doi.org/10.3390/bioengineering13020245 - 20 Feb 2026
Viewed by 51
Abstract
Melanoma is the most lethal form of skin cancer, and early detection is critical for improving patient outcomes. Although dermoscopy combined with deep learning has advanced automated skin-lesion analysis, progress is hindered by limited access to large, well-annotated datasets and by severe class [...] Read more.
Melanoma is the most lethal form of skin cancer, and early detection is critical for improving patient outcomes. Although dermoscopy combined with deep learning has advanced automated skin-lesion analysis, progress is hindered by limited access to large, well-annotated datasets and by severe class imbalance, where melanoma images are substantially underrepresented. To address these challenges, we present the first systematic benchmarking study comparing four GAN architectures—DCGAN, StyleGAN2, and two StyleGAN3 variants (T and R)—for high-resolution (512×512) melanoma-specific synthesis. We train and optimize all models on two expert-annotated benchmarks (ISIC 2018 and ISIC 2020) under unified preprocessing and hyperparameter exploration, with particular attention to R1 regularization tuning. Image quality is assessed through a multi-faceted protocol combining distribution-level metrics (FID), sample-level representativeness (FMD), qualitative dermoscopic inspection, downstream classification with a frozen EfficientNet-based melanoma detector, and independent evaluation by two board-certified dermatologists. StyleGAN2 achieves the best balance of quantitative performance and perceptual quality, attaining FID scores of 24.8 (ISIC 2018) and 7.96 (ISIC 2020) at γ=0.8. The frozen classifier recognizes 83% of StyleGAN2-generated images as melanoma, while dermatologists distinguish synthetic from real images at only 66.5% accuracy (chance = 50%), with low inter-rater agreement (κ=0.17). In a controlled augmentation experiment, adding synthetic melanoma images to address class imbalance improved melanoma detection AUC from 0.925 to 0.945 on a held-out real-image test set. These findings demonstrate that StyleGAN2-generated melanoma images preserve diagnostically relevant features and can provide a measurable benefit for mitigating class imbalance in melanoma-focused machine learning pipelines. Full article
(This article belongs to the Special Issue AI and Data Science in Bioengineering: Innovations and Applications)
Show Figures

Figure 1

21 pages, 3411 KB  
Article
Global Identification of Lunar Dark Mantle Deposits
by Xiaoyang Liu, Jianhui Wang, Denggao Qiu, Jianguo Yan, Jean-Pierre Barriot and Yang Luo
Sensors 2026, 26(4), 1318; https://doi.org/10.3390/s26041318 - 18 Feb 2026
Viewed by 142
Abstract
Lunar dark mantle deposits (DMDs), formed by explosive volcanic activity on the Moon, are typically composed of glass- and iron-rich pyroclastic materials, with slight variations in color, crystallinity, and TiO2 concentration by region. This paper proposes a method for identifying DMDs using [...] Read more.
Lunar dark mantle deposits (DMDs), formed by explosive volcanic activity on the Moon, are typically composed of glass- and iron-rich pyroclastic materials, with slight variations in color, crystallinity, and TiO2 concentration by region. This paper proposes a method for identifying DMDs using the YOLOv8 deep learning model, enhanced by the introduction of a multi-scale feature extraction (MSFE) module with an attention mechanism, which improves the model’s ability to detect targets at different scales. First, a DMD dataset was constructed using Lunar Reconnaissance Orbiter (LRO) data, with manual annotations of DMD regions and lunar image slicing to optimize computational efficiency. The YOLOv8 architecture, with the incorporated MSFE module, was then used to improve model accuracy in complex terrain. The experimental results showed that the improved DM-YOLO model achieved a precision (P) of 83.9%, a recall (R) of 83.2%, and a mean average precision (mAP@0.5) of 84.2%, representing increases of 15.2%, 14.4%, and 14.0%, respectively, over those obtained with the original YOLOv8 model. The predicted results were preliminarily verified using FeO abundance data and further confirmed by analysis of M3 spectral absorption features, showing strong consistency with known DMDs in terms of both chemical composition and mineralogical characteristics. Observations showed that DMDs were located primarily in the low- and mid-latitude regions of the Moon, with most deposits found in the lunar highlands. The findings suggest that the DM-YOLO model has significant potential for providing technical support for lunar exploration and resource development, particularly for identifying small-scale features that are difficult to annotate. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

25 pages, 3611 KB  
Article
Automatic Estimation of Football Possession via Improved YOLOv8 Detection and DBSCAN-Based Team Classification
by Rong Guo, Yucheng Zeng, Rong Deng, Yawen Lei, Yonglin Che, Lin Yu, Jianpeng Zhang, Xiaobin Xu, Zhaoxiang Ma, Jiajin Zhang and Jianke Yang
Sensors 2026, 26(4), 1252; https://doi.org/10.3390/s26041252 - 14 Feb 2026
Viewed by 197
Abstract
Recent developments in computer vision have significantly enhanced the automation and objectivity of sports analytics. This paper proposes a novel deep learning-based framework for estimating football possession directly from broadcast video, eliminating the reliance on manual annotations or event-based data that are often [...] Read more.
Recent developments in computer vision have significantly enhanced the automation and objectivity of sports analytics. This paper proposes a novel deep learning-based framework for estimating football possession directly from broadcast video, eliminating the reliance on manual annotations or event-based data that are often labor-intensive, subjective, and temporally coarse. The framework incorporates two structurally improved object detection models: YOLOv8-P2S3A for football detection and YOLOv8-HWD3A for player detection. These models demonstrate superior accuracy compared to baseline detectors, achieving 79.4% and 71.1% validation average precision, respectively, while maintaining low computational latency. Team identification is accomplished through unsupervised DBSCAN clustering on jersey color features, enabling robust and label-free team assignment across diverse match scenarios. Object trajectories are maintained via the Norfair multi-object tracking algorithm, and a temporally aware refinement module ensures accurate estimation of ball possession durations. Extensive experiments were conducted on a dataset comprising 20 full-match Video clips. The proposed system achieved a root mean square error (RMSE) of 4.87 in possession estimation, outperforming all evaluated baselines, including YOLOv10n (RMSE: 5.12) and YOLOv11 (RMSE: 5.17), with a substantial improvement over YOLOv6n (RMSE: 12.73). These results substantiate the effectiveness of the proposed framework in enhancing the precision, efficiency, and automation of football analytics, offering practical value for coaches, analysts, and sports scientists in professional settings. Full article
Show Figures

Figure 1

37 pages, 2122 KB  
Article
US-ATHC: Unsupervised Multi-Class Glioma Segmentation via Adaptive Thresholding and Clustering
by Jihan Alameddine, Céline Thomarat, Xavier Le-Guillou, Rémy Guillevin, Christine Fernandez-Maloigne and Carole Guillevin
Biomedicines 2026, 14(2), 397; https://doi.org/10.3390/biomedicines14020397 - 9 Feb 2026
Viewed by 224
Abstract
Background/Objectives: Accurate segmentation of gliomas in 3D volumetric MRI is critical for diagnosis, treatment planning, and surgical navigation. However, the scarcity of expert annotations limits the applicability of supervised learning approaches, motivating the development of unsupervised methods. This study presents US-ATHC (Unsupervised Segmentation [...] Read more.
Background/Objectives: Accurate segmentation of gliomas in 3D volumetric MRI is critical for diagnosis, treatment planning, and surgical navigation. However, the scarcity of expert annotations limits the applicability of supervised learning approaches, motivating the development of unsupervised methods. This study presents US-ATHC (Unsupervised Segmentation using Adaptive Thresholding and Hierarchical Clustering), a fully unsupervised two-step pipeline for both global tumor detection and multi-class subregion segmentation. Methods: In the first step, a global tumor mask is extracted by combining adaptive thresholding (Sauvola) with morphological processing on individual MRI slices. The resulting candidates are fused across axial, coronal, and sagittal views using a strict 3D consistency criterion. In the second step, the global mask is refined into a three-class segmentation (active tumor, edema, and necrosis) using optimized affinity propagation clustering. Results: The method was evaluated on the BraTS 2021 dataset, demonstrating accurate tumor and subregion segmentation that outperformed both classical clustering techniques and state-of-the-art deep learning models. External validation on the Gliobiopsy dataset from the University Hospital of Poitiers confirmed robustness and practical applicability in real-world clinical settings. Conclusions: US-ATHC establishes an unsupervised paradigm for glioma segmentation that balances accuracy with computational efficiency. Its annotation-independent nature makes it suitable for scenarios with scarce labeled data, supporting integration into clinical workflows and large-scale neuroimaging studies. Full article
(This article belongs to the Special Issue Medical Imaging in Brain Tumor: Charting the Future)
Show Figures

Figure 1

26 pages, 44941 KB  
Article
Advanced Deep Learning Models for Classifying Dental Diseases from Panoramic Radiographs
by Deema M. Alnasser, Reema M. Alnasser, Wareef M. Alolayan, Shihanah S. Albadi, Haifa F. Alhasson, Amani A. Alkhamees and Shuaa S. Alharbi
Diagnostics 2026, 16(3), 503; https://doi.org/10.3390/diagnostics16030503 - 6 Feb 2026
Viewed by 362
Abstract
Background/Objectives: Dental diseases represent a great problem for oral health care, and early diagnosis is essential to reduce the risk of complications. Panoramic radiographs provide a detailed perspective of dental structures that is suitable for automated diagnostic methods. This paper aims to investigate [...] Read more.
Background/Objectives: Dental diseases represent a great problem for oral health care, and early diagnosis is essential to reduce the risk of complications. Panoramic radiographs provide a detailed perspective of dental structures that is suitable for automated diagnostic methods. This paper aims to investigate the use of an advanced deep learning (DL) model for the multiclass classification of diseases at the sub-diagnosis level using panoramic radiographs to resolve the inconsistencies and skewed classes in the dataset. Methods: To classify and test the models, rich data of 10,580 high-quality panoramic radiographs, initially annotated in 93 classes and subsequently improved to 35 consolidated classes, was used. We applied extensive preprocessing techniques like class consolidation, mislabeled entry correction, redundancy removal and augmentation to reduce the ratio of class imbalance from 2560:1 to 61:1. Five modern convolutional neural network (CNN) architectures—InceptionV3, EfficientNetV2, DenseNet121, ResNet50, and VGG16—were assessed with respect to five metrics: accuracy, mean average precision (mAP), precision, recall, and F1-score. Results: InceptionV3 achieved the best performance with a 97.51% accuracy rate and a mAP of 96.61%, thus confirming its superior ability for diagnosing a wide range of dental conditions. The EfficientNetV2 and DenseNet121 models achieved accuracies of 97.04% and 96.70%, respectively, indicating strong classification performance. ResNet50 and VGG16 also yielded competitive accuracy values comparable to these models. Conclusions: Overall, the results show that deep learning models are successful in dental disease classification, especially the model with the highest accuracy, InceptionV3. New insights and clinical applications will be realized from a further study into dataset expansion, ensemble learning strategies, and the application of explainable artificial intelligence techniques. The findings provide a starting point for implementing automated diagnostic systems for dental diagnosis with greater efficiency, accuracy, and clinical utility in the deployment of oral healthcare. Full article
(This article belongs to the Special Issue Advances in Dental Diagnostics)
Show Figures

Graphical abstract

20 pages, 5171 KB  
Article
LGD-DeepLabV3+: An Enhanced Framework for Remote Sensing Semantic Segmentation via Multi-Level Feature Fusion and Global Modeling
by Xin Wang, Xu Liu, Adnan Mahmood, Yaxin Yang and Xipeng Li
Sensors 2026, 26(3), 1008; https://doi.org/10.3390/s26031008 - 3 Feb 2026
Viewed by 243
Abstract
Remote sensing semantic segmentation encounters several challenges, including scale variation, the coexistence of class similarity and intra-class diversity, difficulties in modeling long-range dependencies, and shadow occlusions. Slender structures and complex boundaries present particular segmentation difficulties, especially in high-resolution imagery acquired by satellite and [...] Read more.
Remote sensing semantic segmentation encounters several challenges, including scale variation, the coexistence of class similarity and intra-class diversity, difficulties in modeling long-range dependencies, and shadow occlusions. Slender structures and complex boundaries present particular segmentation difficulties, especially in high-resolution imagery acquired by satellite and aerial cameras, UAV-borne optical sensors, and other imaging payloads. These sensing systems deliver large-area coverage with fine ground sampling distance, which magnifies domain shifts between different sensors and acquisition conditions. This work builds upon DeepLabV3+ and proposes complementary improvements at three stages: input, context, and decoder fusion. First, to mitigate the interference of complex and heterogeneous data distributions on network optimization, a feature-mapping network is introduced to project raw images into a simpler distribution before they are fed into the segmentation backbone. This approach facilitates training and enhances feature separability. Second, although the Atrous Spatial Pyramid Pooling (ASPP) aggregates multi-scale context, it remains insufficient for modeling long-range dependencies. Therefore, a routing-style global modeling module is incorporated after ASPP to strengthen global relation modeling and ensure cross-region semantic consistency. Third, considering that the fusion between shallow details and deep semantics in the decoder is limited and prone to boundary blurring, a fusion module is designed to facilitate deep interaction and joint learning through cross-layer feature alignment and coupling. The proposed model improves the mean Intersection over Union (mIoU) by 8.83% on the LoveDA dataset and by 6.72% on the ISPRS Potsdam dataset compared to the baseline. Qualitative results further demonstrate clearer boundaries and more stable region annotations, while the proposed modules are plug-and-play and easy to integrate into camera-based remote sensing pipelines and other imaging-sensor systems, providing a practical accuracy–efficiency trade-off. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

21 pages, 1315 KB  
Article
Ensemble Deep Learning Models for Multi-Class DNA Sequence Classification: A Comparative Study of CNN, BiLSTM, and GRU Architectures
by Elias Tabane, Ernest Mnkandla and Zenghui Wang
Appl. Sci. 2026, 16(3), 1545; https://doi.org/10.3390/app16031545 - 3 Feb 2026
Viewed by 281
Abstract
DNA sequence classification is a fundamental problem in bioinformatics, playing an indispensable role in gene annotation and disease prediction. Whereas most deep learning models, such as CNNs, BiLSTM networks, and GRUs, have been found individually optimal, each of these methods excels in modeling [...] Read more.
DNA sequence classification is a fundamental problem in bioinformatics, playing an indispensable role in gene annotation and disease prediction. Whereas most deep learning models, such as CNNs, BiLSTM networks, and GRUs, have been found individually optimal, each of these methods excels in modeling a specific aspect of sequence data: local motifs, long-range dependencies, and efficient temporal modeling of the sequences. Here, we present and evaluate an ensemble model that integrates CNN, BiLSTM, and GRU architectures via a majority voting combination scheme so that their complementary strengths can be harnessed. We trained and evaluated each standalone and the integrated model on a DNA dataset comprising 4380 sequences falling under five functional categories. The ensemble model achieved a classification accuracy of 90.6% with precision, recall, and F1 score equal to 0.91, thereby outperforming the state-of-the-art techniques by large margins. Although previous studies have tried analyzing each Deep Learning method individually for DNA classification tasks, none have attempted a systematic combination of CNN, BiLSTM, and GRU based on their ability to extract features simultaneously. The current research aims at presenting a novel method that combines these architectures based on a Majority Voting strategy and proves how their combination is better at extracting local patterns and long dependency information when compared individually. In particular, the proposed ensemble model smoothed the high recall of BiLSTM with the high precision of CNN, leading to more robust and reliable classification. The experiments involved a publicly available DNA sequence data set of 4380 sequences distributed over 5 classes. Our results emphasized the prospect of hybrid ensemble deep learning as a strong approach for complex genomic data analysis, opening ways toward more accurate and interpretable bioinformatics research. Full article
(This article belongs to the Special Issue Advances in Deep Learning and Intelligent Computing)
Show Figures

Figure 1

24 pages, 2143 KB  
Article
Intelligent Detection and 3D Localization of Bolt Loosening in Steel Structures Using Improved YOLOv9 and Multi-View Fusion
by Fangyuan Cui, Xiaolong Chen and Lie Liang
Buildings 2026, 16(3), 619; https://doi.org/10.3390/buildings16030619 - 2 Feb 2026
Viewed by 163
Abstract
Structural health monitoring of steel buildings requires accurate detection and localization of bolt loosening, a critical yet challenging task due to complex joint geometries and varying environmental conditions. We propose an intelligent framework that integrates an improved YOLOv9 model with multi-view image fusion [...] Read more.
Structural health monitoring of steel buildings requires accurate detection and localization of bolt loosening, a critical yet challenging task due to complex joint geometries and varying environmental conditions. We propose an intelligent framework that integrates an improved YOLOv9 model with multi-view image fusion to address this problem. The method constructs a comprehensive dataset with multi-angle images under diverse lighting, occlusion, and loosening conditions, annotated with multi-task labels for precise training. The YOLOv9 backbone is enhanced with attention mechanisms to focus on key bolt features, while an angle-aware detection head regresses both bounding boxes and rotation angles, enabling loosening state determination through a threshold-based criterion. Furthermore, the framework unifies camera coordinate systems and employs epipolar geometry to fuse 2D detections from multiple views, reconstructing 3D bolt positions and orientations for precise localization. The proposed method achieves robust performance in detecting loosening angles and spatially localizing bolts, offering a practical solution for real-world structural inspections. Its significance lies in the integration of advanced deep learning with multi-view geometry, providing a scalable and automated approach to enhance safety and maintenance efficiency in steel structures. Full article
(This article belongs to the Section Building Structures)
Show Figures

Figure 1

31 pages, 4397 KB  
Article
Transformer-Based Foundation Learning for Robust and Data-Efficient Skin Disease Imaging
by Inzamam Mashood Nasir, Hend Alshaya, Sara Tehsin and Wided Bouchelligua
Diagnostics 2026, 16(3), 440; https://doi.org/10.3390/diagnostics16030440 - 1 Feb 2026
Viewed by 249
Abstract
Background/Objectives: Accurate and reliable automated dermoscopic lesion classification remains challenging. This is due to pronounced dataset bias, limited expert-annotated data, and poor cross-dataset generalization of conventional supervised deep learning models. In clinical dermatology, these limitations restrict the deployment of data-driven diagnostic systems across [...] Read more.
Background/Objectives: Accurate and reliable automated dermoscopic lesion classification remains challenging. This is due to pronounced dataset bias, limited expert-annotated data, and poor cross-dataset generalization of conventional supervised deep learning models. In clinical dermatology, these limitations restrict the deployment of data-driven diagnostic systems across diverse acquisition settings and patient populations. Methods: Motivated by these challenges, this study proposes a transformer-based, dermatology-specific foundation model. The model learns transferable visual representations from large collections of unlabeled dermoscopic images via self-supervised pretraining. It integrates large-scale dermatology-oriented self-supervised learning with a hierarchical vision transformer backbone. This enables effective capture of both fine-grained lesion textures and global morphological patterns. The evaluation is conducted across three publicly available dermoscopic datasets: ISIC 2018, HAM10000, and PH2. The study assesses in-dataset, cross-dataset, limited-label, ablation, and computational-efficiency settings. Results: The proposed approach achieves in-dataset classification accuracies of 94.87%, 97.32%, and 98.17% on ISIC 2018, HAM10000, and PH2, respectively. It outperforms strong transformer and hybrid baselines. Cross-dataset transfer experiments show consistent performance gains of 3.5–5.8% over supervised counterparts. This indicates improved robustness to domain shift. Furthermore, when fine-tuned with only 10% of the labeled training data, the model achieves performance comparable to fully supervised baselines. Conclusions: This highlights strong data efficiency. These results demonstrate that dermatology-specific foundation learning offers a principled and practical solution for robust dermoscopic lesion classification under realistic clinical constraints. Full article
(This article belongs to the Special Issue Advanced Imaging in the Diagnosis and Management of Skin Diseases)
Show Figures

Figure 1

39 pages, 5498 KB  
Article
A Review of Key Technologies and Recent Advances in Intelligent Fruit-Picking Robots
by Tao Lin, Fuchun Sun, Xiaoxiao Li, Xi Guo, Jing Ying, Haorong Wu and Hanshen Li
Horticulturae 2026, 12(2), 158; https://doi.org/10.3390/horticulturae12020158 - 30 Jan 2026
Viewed by 299
Abstract
Intelligent fruit-picking robots have emerged as a promising solution to labor shortages and the increasing costs of manual harvesting. This review provides a systematic and critical overview of recent advances in three core domains: (i) vision-based fruit and peduncle detection, (ii) motion planning [...] Read more.
Intelligent fruit-picking robots have emerged as a promising solution to labor shortages and the increasing costs of manual harvesting. This review provides a systematic and critical overview of recent advances in three core domains: (i) vision-based fruit and peduncle detection, (ii) motion planning and obstacle-aware navigation, and (iii) robotic manipulation technologies for diverse fruit types. We summarize the evolution of deep learning-based perception models, highlighting improvements in occlusion robustness, 3D localization accuracy, and real-time performance. Various planning frameworks—from classical search algorithms to optimization-driven and swarm-intelligent methods—are compared in terms of efficiency and adaptability in unstructured orchard environments. Developments in multi-DOF manipulators, soft and adaptive grippers, and end-effector control strategies are also examined. Despite these advances, critical challenges remain, including heavy dependence on large annotated datasets; sensitivity to illumination and foliage occlusion; limited generalization across fruit varieties; and the difficulty of integrating perception, planning, and manipulation into reliable field-ready systems. Finally, this review outlines emerging research trends such as lightweight multimodal networks, deformable-object manipulation, embodied intelligence, and system-level optimization, offering a forward-looking perspective for autonomous harvesting technologies. Full article
Show Figures

Figure 1

18 pages, 4545 KB  
Article
3D Medical Image Segmentation with 3D Modelling
by Mária Ždímalová, Kristína Boratková, Viliam Sitár, Ľudovít Sebö, Viera Lehotská and Michal Trnka
Bioengineering 2026, 13(2), 160; https://doi.org/10.3390/bioengineering13020160 - 29 Jan 2026
Viewed by 371
Abstract
Background/Objectives: The segmentation of three-dimensional radiological images constitutes a fundamental task in medical image processing for isolating tumors from complex datasets in computed tomography or magnetic resonance imaging. Precise visualization, volumetry, and treatment monitoring are enabled, which are critical for oncology diagnostics and [...] Read more.
Background/Objectives: The segmentation of three-dimensional radiological images constitutes a fundamental task in medical image processing for isolating tumors from complex datasets in computed tomography or magnetic resonance imaging. Precise visualization, volumetry, and treatment monitoring are enabled, which are critical for oncology diagnostics and planning. Volumetric analysis surpasses standard criteria by detecting subtle tumor changes, thereby aiding adaptive therapies. The objective of this study was to develop an enhanced, interactive Graphcut algorithm for 3D DICOM segmentation, specifically designed to improve boundary accuracy and 3D modeling of breast and brain tumors in datasets with heterogeneous tissue intensities. Methods: The standard Graphcut algorithm was augmented with a clustering mechanism (utilizing k = 2–5 clusters) to refine boundary detection in tissues with varying intensities. DICOM datasets were processed into 3D volumes using pixel spacing and slice thickness metadata. User-defined seeds were utilized for tumor and background initialization, constrained by bounding boxes. The method was implemented in Python 3.13 using the PyMaxflow library for graph optimization and pydicom for data transformation. Results: The proposed segmentation method outperformed standard thresholding and region growing techniques, demonstrating reduced noise sensitivity and improved boundary definition. An average Dice Similarity Coefficient (DSC) of 0.92 ± 0.07 was achieved for brain tumors and 0.90 ± 0.05 for breast tumors. These results were found to be comparable to state-of-the-art deep learning benchmarks (typically ranging from 0.84 to 0.95), achieved without the need for extensive pre-training. Boundary edge errors were reduced by a mean of 7.5% through the integration of clustering. Therapeutic changes were quantified accurately (e.g., a reduction from 22,106 mm3 to 14,270 mm3 post-treatment) with an average processing time of 12–15 s per stack. Conclusions: An efficient, precise 3D tumor segmentation tool suitable for diagnostics and planning is presented. This approach is demonstrated to be a robust, data-efficient alternative to deep learning, particularly advantageous in clinical settings where the large annotated datasets required for training neural networks are unavailable. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Graphical abstract

19 pages, 3006 KB  
Article
From Quality Grading to Defect Recognition: A Dual-Pipeline Deep Learning Approach for Automated Mango Assessment
by Shinfeng Lin and Hongting Chiu
Electronics 2026, 15(3), 549; https://doi.org/10.3390/electronics15030549 - 27 Jan 2026
Viewed by 156
Abstract
Mango is a high-value agricultural commodity, and accurate and efficient appearance quality grading and defect inspection are critical for export-oriented markets. This study proposes a dual-pipeline deep learning framework for automated mango assessment, in which surface defect classification and quality grading are jointly [...] Read more.
Mango is a high-value agricultural commodity, and accurate and efficient appearance quality grading and defect inspection are critical for export-oriented markets. This study proposes a dual-pipeline deep learning framework for automated mango assessment, in which surface defect classification and quality grading are jointly implemented within a unified inspection system. For defect assessment, the task is formulated as a multi-label classification problem involving five surface defect categories, eliminating the need for costly bounding box annotations required by conventional object detection models. To address the severe class imbalance commonly encountered in agricultural datasets, a copy–paste-based image synthesis strategy is employed to augment scarce defect samples. For quality grading, mangoes are categorized into three quality levels. Unlike conventional CNN-based approaches relying solely on spatial-domain information, the proposed framework integrates decision-level fusion of spatial-domain and frequency-domain representations to enhance grading stability. In addition, image preprocessing is investigated, showing that adaptive contrast enhancement effectively emphasizes surface textures critical for quality discrimination. Experimental evaluations demonstrate that the proposed framework achieves superior performance in both defect classification and quality grading compared with existing detection-based approaches. The proposed classification-oriented system provides an efficient and practical integrated solution for automated mango assessment. Full article
Show Figures

Figure 1

40 pages, 2048 KB  
Review
A Comparative Study of Emotion Recognition Systems: From Classical Approaches to Multimodal Large Language Models
by Mirela-Magdalena Grosu (Marinescu), Octaviana Datcu, Ruxandra Tapu and Bogdan Mocanu
Appl. Sci. 2026, 16(3), 1289; https://doi.org/10.3390/app16031289 - 27 Jan 2026
Viewed by 220
Abstract
Emotion recognition in video (ERV) aims to infer human affect from visual, audio, and contextual signals and is increasingly important for interactive and intelligent systems. Over the past decade, ERV has evolved from handcrafted features and task-specific deep learning models toward transformer-based vision–language [...] Read more.
Emotion recognition in video (ERV) aims to infer human affect from visual, audio, and contextual signals and is increasingly important for interactive and intelligent systems. Over the past decade, ERV has evolved from handcrafted features and task-specific deep learning models toward transformer-based vision–language models and multimodal large language models (MLLMs). This review surveys this evolution, with an emphasis on engineering considerations relevant to real-world deployment. We analyze multimodal fusion strategies, dataset characteristics, and evaluation protocols, highlighting limitations in robustness, bias, and annotation quality under unconstrained conditions. Emerging MLLM-based approaches are examined in terms of performance, reasoning capability, computational cost, and interaction potential. By comparing task-specific models with foundation model approaches, we clarify their respective strengths for resource-constrained versus context-aware applications. Finally, we outline practical research directions toward building robust, efficient, and deployable ERV systems for applied scenarios such as assistive technologies and human–AI interaction. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

23 pages, 2628 KB  
Article
Scattering-Based Self-Supervised Learning for Label-Efficient Cardiac Image Segmentation
by Serdar Alasu and Muhammed Fatih Talu
Electronics 2026, 15(3), 506; https://doi.org/10.3390/electronics15030506 - 24 Jan 2026
Viewed by 329
Abstract
Deep learning models based on supervised learning rely heavily on large annotated datasets and particularly in the context of medical image segmentation, the requirement for pixel-level annotations makes the labeling process labor-intensive, time-consuming and expensive. To overcome these limitations, self-supervised learning (SSL) has [...] Read more.
Deep learning models based on supervised learning rely heavily on large annotated datasets and particularly in the context of medical image segmentation, the requirement for pixel-level annotations makes the labeling process labor-intensive, time-consuming and expensive. To overcome these limitations, self-supervised learning (SSL) has emerged as a promising alternative that learns generalizable representations from unlabeled data; however, existing SSL frameworks often employ highly parameterized encoders that are computationally expensive and may lack robustness in label-scarce settings. In this work, we propose a scattering-based SSL framework that integrates Wavelet Scattering Networks (WSNs) and Parametric Scattering Networks (PSNs) into a Bootstrap Your Own Latent (BYOL) pretraining pipeline. By replacing the initial stages of the BYOL encoder with fixed or learnable scattering-based front-ends, the proposed method reduces the number of learnable parameters while embedding translation-invariant and small deformation-stable representations into the SSL pipeline. The pretrained encoders are transferred to a U-Net and fine-tuned for cardiac image segmentation on two datasets with different imaging modalities, namely, cardiac cine MRI (ACDC) and cardiac CT (CHD), under varying amounts of labeled data. Experimental results show that scattering-based SSL pretraining consistently improves segmentation performance over random initialization and ImageNet pretraining in low-label regimes, with particularly pronounced gains when only a few labeled patients are available. Notably, the PSN variant achieves improvements of 4.66% and 2.11% in average Dice score over standard BYOL with only 5 and 10 labeled patients, respectively, on the ACDC dataset. These results demonstrate that integrating mathematically grounded scattering representations into SSL pipelines provides a robust and data-efficient initialization strategy for cardiac image segmentation, particularly under limited annotation and domain shift. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

16 pages, 1428 KB  
Article
StrDiSeg: Adapter-Enhanced DINOv3 for Automated Ischemic Stroke Lesion Segmentation
by Qiong Chen, Donghao Zhang, Yimin Chen, Siyuan Zhang, Yue Sun, Fabiano Reis, Li M. Li, Li Yuan, Huijuan Jin and Wu Qiu
Bioengineering 2026, 13(2), 133; https://doi.org/10.3390/bioengineering13020133 - 23 Jan 2026
Viewed by 398
Abstract
Deep vision foundation models such as DINOv3 offer strong visual representation capacity, but their direct deployment in medical image segmentation remains difficult due to the limited availability of annotated clinical data and the computational cost of full fine-tuning. This study proposes an adaptation [...] Read more.
Deep vision foundation models such as DINOv3 offer strong visual representation capacity, but their direct deployment in medical image segmentation remains difficult due to the limited availability of annotated clinical data and the computational cost of full fine-tuning. This study proposes an adaptation framework called StrDiSeg that integrates lightweight bottleneck adapters between selected transformer layers of DINOv3, enabling task-specific learning while preserving pretrained knowledge. An attention-enhanced U-Net decoder with multi-scale feature fusion further refines the representations. Experiments were performed on two publicly available ischemic stroke lesion segmentation datasets—AISD (Non Contrast CT) and ISLES22 (DWI). The proposed method achieved Dice scores of 0.516 on AISD and 0.824 on ISLES22, outperforming baseline models and demonstrating strong robustness across different clinical imaging modalities. These results indicate that adapter-based fine-tuning provides a practical and computationally efficient strategy for leveraging large pretrained vision models in medical image segmentation. Full article
Show Figures

Figure 1

Back to TopTop