Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,924)

Search Parameters:
Keywords = label image accuracy

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 10648 KiB  
Article
Meta-Learning-Integrated Neural Architecture Search for Few-Shot Hyperspectral Image Classification
by Aili Wang, Kang Zhang, Haibin Wu, Haisong Chen and Minhui Wang
Electronics 2025, 14(15), 2952; https://doi.org/10.3390/electronics14152952 (registering DOI) - 24 Jul 2025
Abstract
In order to address the limitations of the number of label samples in practical accurate classification scenarios and the problems of overfitting and an insufficient generalization ability caused by Few-Shot Learning (FSL) in hyperspectral image classification (HSIC), this paper designs and implements a [...] Read more.
In order to address the limitations of the number of label samples in practical accurate classification scenarios and the problems of overfitting and an insufficient generalization ability caused by Few-Shot Learning (FSL) in hyperspectral image classification (HSIC), this paper designs and implements a neural architecture search (NAS) for a few-shot HSI classification method that combines meta learning. Firstly, a multi-source domain learning framework was constructed to integrate heterogeneous natural images and homogeneous remote sensing images to improve the information breadth of few-sample learning, enabling the final network to enhance its generalization ability under limited labeled samples by learning the similarity between different data sources. Secondly, by constructing precise and robust search spaces and deploying different units at different locations, the classification accuracy and model transfer robustness of the final network can be improved. This method fully utilizes spatial texture information and rich category information of multi-source data and transfers the learned meta knowledge to the optimal architecture for HSIC execution through precise and robust search space design, achieving HSIC tasks with limited samples. Experimental results have shown that our proposed method achieved an overall accuracy (OA) of 98.57%, 78.39%, and 98.74% for classification on the Pavia Center, Indian Pine, and WHU-Hi-LongKou datasets, respectively. It is fully demonstrated that utilizing spatial texture information and rich category information of multi-source data, and through precise and robust search space design, the learned meta knowledge is fully transmitted to the optimal architecture for HSIC, perfectly achieving classification tasks with few-shot samples. Full article
Show Figures

Figure 1

17 pages, 2307 KiB  
Article
DeepBiteNet: A Lightweight Ensemble Framework for Multiclass Bug Bite Classification Using Image-Based Deep Learning
by Doston Khasanov, Halimjon Khujamatov, Muksimova Shakhnoza, Mirjamol Abdullaev, Temur Toshtemirov, Shahzoda Anarova, Cheolwon Lee and Heung-Seok Jeon
Diagnostics 2025, 15(15), 1841; https://doi.org/10.3390/diagnostics15151841 - 22 Jul 2025
Viewed by 27
Abstract
Background/Objectives: The accurate identification of insect bites from images of skin is daunting due to the fine gradations among diverse bite types, variability in human skin response, and inconsistencies in image quality. Methods: For this work, we introduce DeepBiteNet, a new [...] Read more.
Background/Objectives: The accurate identification of insect bites from images of skin is daunting due to the fine gradations among diverse bite types, variability in human skin response, and inconsistencies in image quality. Methods: For this work, we introduce DeepBiteNet, a new ensemble-based deep learning model designed to perform robust multiclass classification of insect bites from RGB images. Our model aggregates three semantically diverse convolutional neural networks—DenseNet121, EfficientNet-B0, and MobileNetV3-Small—using a stacked meta-classifier designed to aggregate their predicted outcomes into an integrated, discriminatively strong output. Our technique balances heterogeneous feature representation with suppression of individual model biases. Our model was trained and evaluated on a hand-collected set of 1932 labeled images representing eight classes, consisting of common bites such as mosquito, flea, and tick bites, and unaffected skin. Our domain-specific augmentation pipeline imputed practical variability in lighting, occlusion, and skin tone, thereby boosting generalizability. Results: Our model, DeepBiteNet, achieved a training accuracy of 89.7%, validation accuracy of 85.1%, and test accuracy of 84.6%, and surpassed fifteen benchmark CNN architectures on all key indicators, viz., precision (0.880), recall (0.870), and F1-score (0.875). Our model, optimized for mobile deployment with quantization and TensorFlow Lite, enables rapid on-client computation and eliminates reliance on cloud-based processing. Conclusions: Our work shows how ensemble learning, when carefully designed and combined with realistic data augmentation, can boost the reliability and usability of automatic insect bite diagnosis. Our model, DeepBiteNet, forms a promising foundation for future integration with mobile health (mHealth) solutions and may complement early diagnosis and triage in dermatologically underserved regions. Full article
(This article belongs to the Special Issue Artificial Intelligence in Biomedical Diagnostics and Analysis 2024)
Show Figures

Figure 1

21 pages, 16254 KiB  
Article
Prediction of Winter Wheat Yield and Interpretable Accuracy Under Different Water and Nitrogen Treatments Based on CNNResNet-50
by Donglin Wang, Yuhan Cheng, Longfei Shi, Huiqing Yin, Guangguang Yang, Shaobo Liu, Qinge Dong and Jiankun Ge
Agronomy 2025, 15(7), 1755; https://doi.org/10.3390/agronomy15071755 - 21 Jul 2025
Viewed by 227
Abstract
Winter wheat yield prediction is critical for optimizing field management plans and guiding agricultural production. To address the limitations of conventional manual yield estimation methods, including low efficiency and poor interpretability, this study innovatively proposes an intelligent yield estimation method based on a [...] Read more.
Winter wheat yield prediction is critical for optimizing field management plans and guiding agricultural production. To address the limitations of conventional manual yield estimation methods, including low efficiency and poor interpretability, this study innovatively proposes an intelligent yield estimation method based on a convolutional neural network (CNN). A comprehensive two-factor (fertilization × irrigation) controlled field experiment was designed to thoroughly validate the applicability and effectiveness of this method. The experimental design comprised two irrigation treatments, sufficient irrigation (C) at 750 m3 ha−1 and deficit irrigation (M) at 450 m3 ha−1, along with five fertilization treatments (at a rate of 180 kg N ha−1): (1) organic fertilizer alone, (2) organic–inorganic fertilizer blend at a 7:3 ratio, (3) organic–inorganic fertilizer blend at a 3:7 ratio, (4) inorganic fertilizer alone, and (5) no fertilizer control. The experimental protocol employed a DJI M300 RTK unmanned aerial vehicle (UAV) equipped with a multispectral sensor to systematically acquire high-resolution growth imagery of winter wheat across critical phenological stages, from heading to maturity. The acquired multispectral imagery was meticulously annotated using the Labelme professional annotation tool to construct a comprehensive experimental dataset comprising over 2000 labeled images. These annotated data were subsequently employed to train an enhanced CNN model based on ResNet50 architecture, which achieved automated generation of panicle density maps and precise panicle counting, thereby realizing yield prediction. Field experimental results demonstrated significant yield variations among fertilization treatments under sufficient irrigation, with the 3:7 organic–inorganic blend achieving the highest actual yield (9363.38 ± 468.17 kg ha−1) significantly outperforming other treatments (p < 0.05), confirming the synergistic effects of optimized nitrogen and water management. The enhanced CNN model exhibited superior performance, with an average accuracy of 89.0–92.1%, representing a 3.0% improvement over YOLOv8. Notably, model accuracy showed significant correlation with yield levels (p < 0.05), suggesting more distinct panicle morphological features in high-yield plots that facilitated model identification. The CNN’s yield predictions demonstrated strong agreement with the measured values, maintaining mean relative errors below 10%. Particularly outstanding performance was observed for the organic fertilizer with full irrigation (5.5% error) and the 7:3 organic-inorganic blend with sufficient irrigation (8.0% error), indicating that the CNN network is more suitable for these management regimes. These findings provide a robust technical foundation for precision farming applications in winter wheat production. Future research will focus on integrating this technology into smart agricultural management systems to enable real-time, data-driven decision making at the farm scale. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

15 pages, 677 KiB  
Article
Zero-Shot Learning for Sustainable Municipal Waste Classification
by Dishant Mewada, Eoin Martino Grua, Ciaran Eising, Patrick Denny, Pepijn Van de Ven and Anthony Scanlan
Recycling 2025, 10(4), 144; https://doi.org/10.3390/recycling10040144 - 21 Jul 2025
Viewed by 143
Abstract
Automated waste classification is an essential step toward efficient recycling and waste management. Traditional deep learning models, such as convolutional neural networks, rely on extensive labeled datasets to achieve high accuracy. However, the annotation process is labor-intensive and time-consuming, limiting the scalability of [...] Read more.
Automated waste classification is an essential step toward efficient recycling and waste management. Traditional deep learning models, such as convolutional neural networks, rely on extensive labeled datasets to achieve high accuracy. However, the annotation process is labor-intensive and time-consuming, limiting the scalability of these approaches in real-world applications. Zero-shot learning is a machine learning paradigm that enables a model to recognize and classify objects it has never seen during training by leveraging semantic relationships and external knowledge sources. In this study, we investigate the potential of zero-shot learning for waste classification using two vision-language models: OWL-ViT and OpenCLIP. These models can classify waste without direct exposure to labeled examples by leveraging textual prompts. We apply this approach to the TrashNet dataset, which consists of images of municipal solid waste organized into six distinct categories: cardboard, glass, metal, paper, plastic, and trash. Our experimental results yield an average classification accuracy of 76.30% with Open Clip ViT-L/14-336 model, demonstrating the feasibility of zero-shot learning for waste classification while highlighting challenges in prompt sensitivity and class imbalance. Despite lower accuracy than CNN- and ViT-based classification models, zero-shot learning offers scalability and adaptability by enabling the classification of novel waste categories without retraining. This study underscores the potential of zero-shot learning in automated recycling systems, paving the way for more efficient, scalable, and annotation-free waste classification methodologies. Full article
Show Figures

Figure 1

20 pages, 688 KiB  
Article
Multi-Modal AI for Multi-Label Retinal Disease Prediction Using OCT and Fundus Images: A Hybrid Approach
by Amina Zedadra, Mahmoud Yassine Salah-Salah, Ouarda Zedadra and Antonio Guerrieri
Sensors 2025, 25(14), 4492; https://doi.org/10.3390/s25144492 - 19 Jul 2025
Viewed by 268
Abstract
Ocular diseases can significantly affect vision and overall quality of life, with diagnosis often being time-consuming and dependent on expert interpretation. While previous computer-aided diagnostic systems have focused primarily on medical imaging, this paper proposes VisionTrack, a multi-modal AI system for predicting multiple [...] Read more.
Ocular diseases can significantly affect vision and overall quality of life, with diagnosis often being time-consuming and dependent on expert interpretation. While previous computer-aided diagnostic systems have focused primarily on medical imaging, this paper proposes VisionTrack, a multi-modal AI system for predicting multiple retinal diseases, including Diabetic Retinopathy (DR), Age-related Macular Degeneration (AMD), Diabetic Macular Edema (DME), drusen, Central Serous Retinopathy (CSR), and Macular Hole (MH), as well as normal cases. The proposed framework integrates a Convolutional Neural Network (CNN) for image-based feature extraction, a Graph Neural Network (GNN) to model complex relationships among clinical risk factors, and a Large Language Model (LLM) to process patient medical reports. By leveraging diverse data sources, VisionTrack improves prediction accuracy and offers a more comprehensive assessment of retinal health. Experimental results demonstrate the effectiveness of this hybrid system, highlighting its potential for early detection, risk assessment, and personalized ophthalmic care. Experiments were conducted using two publicly available datasets, RetinalOCT and RFMID, which provide diverse retinal imaging modalities: OCT images and fundus images, respectively. The proposed multi-modal AI system demonstrated strong performance in multi-label disease prediction. On the RetinalOCT dataset, the model achieved an accuracy of 0.980, F1-score of 0.979, recall of 0.978, and precision of 0.979. Similarly, on the RFMID dataset, it reached an accuracy of 0.989, F1-score of 0.881, recall of 0.866, and precision of 0.897. These results confirm the robustness, reliability, and generalization capability of the proposed approach across different imaging modalities. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

26 pages, 5414 KiB  
Article
Profile-Based Building Detection Using Convolutional Neural Network and High-Resolution Digital Surface Models
by Behaeen Farajelahi and Hossein Arefi
Remote Sens. 2025, 17(14), 2496; https://doi.org/10.3390/rs17142496 - 17 Jul 2025
Viewed by 315
Abstract
This research presents a novel method for detecting building roof types using deep learning models based on height profiles from high-resolution digital surface models. While deep learning has proven effective in digit, handwritten, and time series classification, this study focuses on the emerging [...] Read more.
This research presents a novel method for detecting building roof types using deep learning models based on height profiles from high-resolution digital surface models. While deep learning has proven effective in digit, handwritten, and time series classification, this study focuses on the emerging and crucial area of height profile detection for building roof type classification. We propose an innovative approach to automatically generate, classify, and detect building roof types using height profiles derived from normalized digital surface models. We present three distinct methods to detect seven roof types from two height profiles of the building cross-section. The first two methods detect the building roof type from two-dimensional (2D) height profiles: two binary images and a two-band spectral image. The third method, vector-based, detects the building roof type from two one-dimensional (1D) height profiles represented as two 1D vectors. We trained various one- and two-dimensional convolutional neural networks on these 1D and 2D height profiles. The DenseNet201 network could directly detect the roof type of a building from two height profiles stored as a two-band spectral image with an average accuracy of 97%, even in the presence of consecutive chimneys, dormers, and noise. The strengths of this approach include the generation of a large, detailed, and storage-efficient labeled height profile dataset, the development of a robust classification method using both 1D and 2D height profiles, and an automated workflow that enhances building roof type detection. Full article
Show Figures

Figure 1

30 pages, 12494 KiB  
Article
Satellite-Based Approach for Crop Type Mapping and Assessment of Irrigation Performance in the Nile Delta
by Samar Saleh, Saher Ayyad and Lars Ribbe
Earth 2025, 6(3), 80; https://doi.org/10.3390/earth6030080 - 16 Jul 2025
Viewed by 279
Abstract
Water scarcity, exacerbated by climate change, population growth, and competing sectoral demands, poses a major threat to agricultural sustainability, particularly in irrigated regions such as the Nile Delta in Egypt. Addressing this challenge requires innovative approaches to evaluate irrigation performance despite the limitations [...] Read more.
Water scarcity, exacerbated by climate change, population growth, and competing sectoral demands, poses a major threat to agricultural sustainability, particularly in irrigated regions such as the Nile Delta in Egypt. Addressing this challenge requires innovative approaches to evaluate irrigation performance despite the limitations in ground data availability. Traditional assessment methods are often costly, labor-intensive, and reliant on field data, limiting their scalability, especially in data-scarce regions. This paper addresses this gap by presenting a comprehensive and scalable framework that employs publicly accessible satellite data to map crop types and subsequently assess irrigation performance without the need for ground truthing. The framework consists of two parts: First, crop mapping, which was conducted seasonally between 2015 and 2020 for the four primary crops in the Nile Delta (rice, maize, wheat, and clover). The WaPOR v2 Land Cover Classification layer was used as a substitute for ground truth data to label the Landsat-8 images for training the random forest algorithm. The crop maps generated at 30 m resolution had moderate to high accuracy, with overall accuracy ranging from 0.77 to 0.80 in summer and 0.87–0.95 in winter. The estimated crop areas aligned well with national agricultural statistics. Second, based on the mapped crops, three irrigation performance indicators—adequacy, reliability, and equity—were calculated and compared with their established standards. The results reveal a good level of equity, with values consistently below 10%, and a relatively reliable water supply, as indicated by the reliability indicator (0.02–0.08). Average summer adequacy ranged from 0.4 to 0.63, indicating insufficient supply, whereas winter values (1.3 to 1.7) reflected a surplus. A noticeable improvement gradient was observed for all indicators toward the north of the delta, while areas located in the delta’s new lands consistently displayed unfavorable conditions in all indicators. This approach facilitates the identification of regions where agricultural performance falls short of its potential, thereby offering valuable insights into where and how irrigation systems can be strategically improved to enhance overall performance sustainably. Full article
Show Figures

Figure 1

29 pages, 8563 KiB  
Article
A Bridge Crack Segmentation Algorithm Based on Fuzzy C-Means Clustering and Feature Fusion
by Yadong Yao, Yurui Zhang, Zai Liu and Heming Yuan
Sensors 2025, 25(14), 4399; https://doi.org/10.3390/s25144399 - 14 Jul 2025
Viewed by 265
Abstract
In response to the limitations of traditional image processing algorithms, such as high noise sensitivity and threshold dependency in bridge crack detection, and the extensive labeled data requirements of deep learning methods, this study proposes a novel crack segmentation algorithm based on fuzzy [...] Read more.
In response to the limitations of traditional image processing algorithms, such as high noise sensitivity and threshold dependency in bridge crack detection, and the extensive labeled data requirements of deep learning methods, this study proposes a novel crack segmentation algorithm based on fuzzy C-means (FCM) clustering and multi-feature fusion. A three-dimensional feature space is constructed using B-channel pixels and fuzzy clustering with c = 3, justified by the distinct distribution patterns of these three regions in the image, enabling effective preliminary segmentation. To enhance accuracy, connected domain labeling combined with a circularity threshold is introduced to differentiate linear cracks from granular noise. Furthermore, a 5 × 5 neighborhood search strategy, based on crack pixel amplitude, is designed to restore the continuity of fragmented cracks. Experimental results on the Concrete Crack and SDNET2018 datasets demonstrate that the proposed algorithm achieves an accuracy of 0.885 and a recall rate of 0.891, outperforming DeepLabv3+ by 4.2%. Notably, with a processing time of only 0.8 s per image, the algorithm balances high accuracy with real-time efficiency, effectively addressing challenges, such as missed fine cracks and misjudged broken cracks in noisy environments by integrating geometric features and pixel distribution characteristics. This study provides an efficient unsupervised solution for bridge damage detection. Full article
Show Figures

Figure 1

17 pages, 464 KiB  
Article
Detection of Major Depressive Disorder from Functional Magnetic Resonance Imaging Using Regional Homogeneity and Feature/Sample Selective Evolving Voting Ensemble Approaches
by Bindiya A. R., B. S. Mahanand, Vasily Sachnev and DIRECT Consortium
J. Imaging 2025, 11(7), 238; https://doi.org/10.3390/jimaging11070238 - 14 Jul 2025
Viewed by 250
Abstract
Major depressive disorder is a mental illness characterized by persistent sadness or loss of interest that affects a person’s daily life. Early detection of this disorder is crucial for providing timely and effective treatment. Neuroimaging modalities, namely, functional magnetic resonance imaging, can be [...] Read more.
Major depressive disorder is a mental illness characterized by persistent sadness or loss of interest that affects a person’s daily life. Early detection of this disorder is crucial for providing timely and effective treatment. Neuroimaging modalities, namely, functional magnetic resonance imaging, can be used to identify changes in brain regions related to major depressive disorder. In this study, regional homogeneity images, one of the derivative of functional magnetic resonance imaging is employed to detect major depressive disorder using the proposed feature/sample evolving voting ensemble approach. A total of 2380 subjects consisting of 1104 healthy controls and 1276 patients with major depressive disorder from Rest-meta-MDD consortium are studied. Regional homogeneity features from 90 regions are extracted using automated anatomical labeling template. These regional homogeneity features are then fed as an input to the proposed feature/sample selective evolving voting ensemble for classification. The proposed approach achieves an accuracy of 91.93%, and discriminative features obtained from the classifier are used to identify brain regions which may be responsible for major depressive disorder. A total of nine brain regions, namely, left superior temporal gyrus, left postcentral gyrus, left anterior cingulate gyrus, right inferior parietal lobule, right superior medial frontal gyrus, left lingual gyrus, right putamen, left fusiform gyrus, and left middle temporal gyrus, are identified. This study clearly indicates that these brain regions play a critical role in detecting major depressive disorder. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

18 pages, 4631 KiB  
Article
Semantic Segmentation of Rice Fields in Sub-Meter Satellite Imagery Using an HRNet-CA-Enhanced DeepLabV3+ Framework
by Yifan Shao, Pan Pan, Hongxin Zhao, Jiale Li, Guoping Yu, Guomin Zhou and Jianhua Zhang
Remote Sens. 2025, 17(14), 2404; https://doi.org/10.3390/rs17142404 - 11 Jul 2025
Viewed by 346
Abstract
Accurate monitoring of rice-planting areas underpins food security and evidence-based farm management. Recent work has advanced along three complementary lines—multi-source data fusion (to mitigate cloud and spectral confusion), temporal feature extraction (to exploit phenology), and deep-network architecture optimization. However, even the best fusion- [...] Read more.
Accurate monitoring of rice-planting areas underpins food security and evidence-based farm management. Recent work has advanced along three complementary lines—multi-source data fusion (to mitigate cloud and spectral confusion), temporal feature extraction (to exploit phenology), and deep-network architecture optimization. However, even the best fusion- and time-series-based approaches still struggle to preserve fine spatial details in sub-meter scenes. Targeting this gap, we propose an HRNet-CA-enhanced DeepLabV3+ that retains the original model’s strengths while resolving its two key weaknesses: (i) detail loss caused by repeated down-sampling and feature-pyramid compression and (ii) boundary blurring due to insufficient multi-scale information fusion. The Xception backbone is replaced with a High-Resolution Network (HRNet) to maintain full-resolution feature streams through multi-resolution parallel convolutions and cross-scale interactions. A coordinate attention (CA) block is embedded in the decoder to strengthen spatially explicit context and sharpen class boundaries. The rice dataset consisted of 23,295 images (11,295 rice + 12,000 non-rice) via preprocessing and manual labeling and benchmarked the proposed model against classical segmentation networks. Our approach boosts boundary segmentation accuracy to 92.28% MIOU and raises texture-level discrimination to 95.93% F1, without extra inference latency. Although this study focuses on architecture optimization, the HRNet-CA backbone is readily compatible with future multi-source fusion and time-series modules, offering a unified path toward operational paddy mapping in fragmented sub-meter landscapes. Full article
Show Figures

Figure 1

18 pages, 1667 KiB  
Article
Multi-Task Deep Learning for Simultaneous Classification and Segmentation of Cancer Pathologies in Diverse Medical Imaging Modalities
by Maryem Rhanoui, Khaoula Alaoui Belghiti and Mounia Mikram
Onco 2025, 5(3), 34; https://doi.org/10.3390/onco5030034 - 11 Jul 2025
Viewed by 253
Abstract
Background: Clinical imaging is an important part of health care providing physicians with great assistance in patients treatment. In fact, segmentation and grading of tumors can help doctors assess the severity of the cancer at an early stage and increase the chances [...] Read more.
Background: Clinical imaging is an important part of health care providing physicians with great assistance in patients treatment. In fact, segmentation and grading of tumors can help doctors assess the severity of the cancer at an early stage and increase the chances of cure. Despite that Deep Learning for cancer diagnosis has achieved clinically acceptable accuracy, there still remains challenging tasks, especially in the context of insufficient labeled data and the subsequent need for expensive computational ressources. Objective: This paper presents a lightweight classification and segmentation deep learning model to assist in the identification of cancerous tumors with high accuracy despite the scarcity of medical data. Methods: We propose a multi-task architecture for classification and segmentation of cancerous tumors in the Brain, Skin, Prostate and lungs. The model is based on the UNet architecture with different pre-trained deep learning models (VGG 16 and MobileNetv2) as a backbone. The multi-task model is validated on relatively small datasets (slightly exceed 1200 images) that are diverse in terms of modalities (IRM, X-Ray, Dermoscopic and Digital Histopathology), number of classes, shapes, and sizes of cancer pathologies using the accuracy and dice coefficient as statistical metrics. Results: Experiments show that the multi-task approach improve the learning efficiency and the prediction accuracy for the segmentation and classification tasks, compared to training the individual models separately. The multi-task architecture reached a classification accuracy of 86%, 90%, 88%, and 87% respectively for Skin Lesion, Brain Tumor, Prostate Cancer and Pneumothorax. For the segmentation tasks we were able to achieve high precisions respectively 95%, 98% for the Skin Lesion and Brain Tumor segmentation and a 99% precise segmentation for both Prostate cancer and Pneumothorax. Proving that the multi-task solution is more efficient than single-task networks. Full article
Show Figures

Figure 1

26 pages, 718 KiB  
Review
Advancements in Semi-Supervised Deep Learning for Brain Tumor Segmentation in MRI: A Literature Review
by Chengcheng Jin, Theam Foo Ng and Haidi Ibrahim
AI 2025, 6(7), 153; https://doi.org/10.3390/ai6070153 - 11 Jul 2025
Viewed by 436
Abstract
For automatic tumor segmentation in magnetic resonance imaging (MRI), deep learning offers very powerful technical support with significant results. However, the success of supervised learning is strongly dependent on the quantity and accuracy of labeled training data, which is challenging to acquire in [...] Read more.
For automatic tumor segmentation in magnetic resonance imaging (MRI), deep learning offers very powerful technical support with significant results. However, the success of supervised learning is strongly dependent on the quantity and accuracy of labeled training data, which is challenging to acquire in MRI. Semi-supervised learning approaches have arisen to tackle this difficulty, yielding comparable brain tumor segmentation outcomes with fewer labeled samples. This literature review explores key semi-supervised learning techniques for medical image segmentation, including pseudo-labeling, consistency regularization, generative adversarial networks, contrastive learning, and holistic methods. We specifically examine the application of these approaches in brain tumor MRI segmentation. Our findings suggest that semi-supervised learning can outperform traditional supervised methods by providing more effective guidance, thereby enhancing the potential for clinical computer-aided diagnosis. This literature review serves as a comprehensive introduction to semi-supervised learning in tumor MRI segmentation, including glioma segmentation, offering valuable insights and a comparative analysis of current methods for researchers in the field. Full article
(This article belongs to the Section Medical & Healthcare AI)
Show Figures

Figure 1

25 pages, 1669 KiB  
Article
Zero-Shot Infrared Domain Adaptation for Pedestrian Re-Identification via Deep Learning
by Xu Zhang, Yinghui Liu, Liangchen Guo and Huadong Sun
Electronics 2025, 14(14), 2784; https://doi.org/10.3390/electronics14142784 - 10 Jul 2025
Viewed by 193
Abstract
In computer vision, the performance of detectors trained under optimal lighting conditions is significantly impaired when applied to infrared domains due to the scarcity of labeled infrared target domain data and the inherent degradation in infrared image quality. Progress in cross-domain pedestrian re-identification [...] Read more.
In computer vision, the performance of detectors trained under optimal lighting conditions is significantly impaired when applied to infrared domains due to the scarcity of labeled infrared target domain data and the inherent degradation in infrared image quality. Progress in cross-domain pedestrian re-identification is hindered by the lack of labeled infrared image data. To address the degradation of pedestrian recognition in infrared environments, we propose a framework for zero-shot infrared domain adaptation. This integrated approach is designed to mitigate the challenges of pedestrian recognition in infrared domains while enabling zero-shot domain adaptation. Specifically, an advanced reflectance representation learning module and an exchange–re-decomposition–coherence process are employed to learn illumination invariance and to enhance the model’s effectiveness, respectively. Additionally, the CLIP (Contrastive Language–Image Pretraining) image encoder and DINO (Distillation with No Labels) are fused for feature extraction, improving model performance under infrared conditions and enhancing its generalization capability. To further improve model performance, we introduce the Non-Local Attention (NLA) module, the Instance-based Weighted Part Attention (IWPA) module, and the Multi-head Self-Attention module. The NLA module captures global feature dependencies, particularly long-range feature relationships, effectively mitigating issues such as blurred or missing image information in feature degradation scenarios. The IWPA module focuses on localized regions to enhance model accuracy in complex backgrounds and unevenly lit scenes. Meanwhile, the Multi-head Self-Attention module captures long-range dependencies between cross-modal features, further strengthening environmental understanding and scene modeling. The key innovation of this work lies in the skillful combination and application of existing technologies to new domains, overcoming the challenges posed by vision in infrared environments. Experimental results on the SYSU-MM01 dataset show that, under the single-shot setting, Rank-1 Accuracy (Rank-1) andmean Average Precision (mAP) values of 37.97% and 37.25%, respectively, were achieved, while in the multi-shot setting, values of 34.96% and 34.14% were attained. Full article
(This article belongs to the Special Issue Deep Learning in Image Processing and Computer Vision)
Show Figures

Figure 1

14 pages, 1975 KiB  
Article
A Study on the Diagnosis of Lumbar Spinal Malposition in Chuna Manual Therapy Using X-Ray Images Based on Digital Markers
by Min-Su Ju, Tae-Yong Park, Minho Choi, Younseok Ko, Young Cheol Na, Yeong Ha Jeong, Jun-Su Jang and Jin-Hyun Lee
Diagnostics 2025, 15(14), 1748; https://doi.org/10.3390/diagnostics15141748 - 10 Jul 2025
Viewed by 319
Abstract
Background/Objectives: This study aimed to evaluate digital markers and establish quantitative diagnostic criteria for spinal malpositions in Chuna manual therapy using lumbar X-rays. Methods: A total of 2000 X-ray images were collected from adult patients at the International St. Mary’s Hospital of Catholic [...] Read more.
Background/Objectives: This study aimed to evaluate digital markers and establish quantitative diagnostic criteria for spinal malpositions in Chuna manual therapy using lumbar X-rays. Methods: A total of 2000 X-ray images were collected from adult patients at the International St. Mary’s Hospital of Catholic Kwandong University. Five Chuna manual medicine experts annotated anatomical landmarks using a digital marker labeling program and diagnosed three types of spinal malpositions: flexion/extension, lateral bending, and rotation. Diagnostic accuracy was evaluated using weighted F1 (F1_W) scores, and the optimal threshold values for each malposition type were determined based on maximum F1_W performance. Results: The results showed high diagnostic performance, with average maximum F1_W scores of 0.76 for flexion/extension, 0.85 for lateral bending, and 0.71 for rotation. Based on this analysis, threshold angles for each type of spinal malposition in Chuna manual diagnosis were determined. Conclusions: This study demonstrates the diagnostic validity of digital marker-based X-ray analysis in Chuna manual therapy and is the first to propose quantitative diagnostic thresholds for spinal malpositions. These findings may serve as a foundation for clinical application in spinal assessment and treatment planning, with further validation studies warranted. Full article
Show Figures

Figure 1

23 pages, 3645 KiB  
Article
Color-Guided Mixture-of-Experts Conditional GAN for Realistic Biomedical Image Synthesis in Data-Scarce Diagnostics
by Patrycja Kwiek, Filip Ciepiela and Małgorzata Jakubowska
Electronics 2025, 14(14), 2773; https://doi.org/10.3390/electronics14142773 - 10 Jul 2025
Viewed by 182
Abstract
Background: Limited availability of high-quality labeled biomedical image datasets presents a significant challenge for training deep learning models in medical diagnostics. This study proposes a novel image generation framework combining conditional generative adversarial networks (cGANs) with a Mixture-of-Experts (MoE) architecture and color histogram-aware [...] Read more.
Background: Limited availability of high-quality labeled biomedical image datasets presents a significant challenge for training deep learning models in medical diagnostics. This study proposes a novel image generation framework combining conditional generative adversarial networks (cGANs) with a Mixture-of-Experts (MoE) architecture and color histogram-aware loss functions to enhance synthetic blood cell image quality. Methods: RGB microscopic images from the BloodMNIST dataset (eight blood cell types, resolution 3 × 128 × 128) underwent preprocessing with k-means clustering to extract the dominant colors and UMAP for visualizing class similarity. Spearman correlation-based distance matrices were used to evaluate the discriminative power of each RGB channel. A MoE–cGAN architecture was developed with residual blocks and LeakyReLU activations. Expert generators were conditioned on cell type, and the generator’s loss was augmented with a Wasserstein distance-based term comparing red and green channel histograms, which were found most relevant for class separation. Results: The red and green channels contributed most to class discrimination; the blue channel had minimal impact. The proposed model achieved 0.97 classification accuracy on generated images (ResNet50), with 0.96 precision, 0.97 recall, and a 0.96 F1-score. The best Fréchet Inception Distance (FID) was 52.1. Misclassifications occurred mainly among visually similar cell types. Conclusions: Integrating histogram alignment into the MoE–cGAN training significantly improves the realism and class-specific variability of synthetic images, supporting robust model development under data scarcity in hematological imaging. Full article
Show Figures

Figure 1

Back to TopTop