Deep Learning in Image Analysis: Progress and Challenges

A special issue of Journal of Imaging (ISSN 2313-433X).

Deadline for manuscript submissions: closed (31 December 2024) | Viewed by 10327

Special Issue Editor


E-Mail Website
Guest Editor
School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China
Interests: medical image analysis; object detection; person ReID; deep learning; domain generalization

Special Issue Information

Dear Colleagues,

Due to its huge advantages, deep-learning-related methods have become the mainstream technology in image analysis, making significant progress and holding vast prospects. They are applied in a variety of tasks, such as image classification, object segmentation, object detection, image registration, image fusion, biomedical engineering, natural language processing, etc. Among different kinds of deep learning techniques, supervised learning was first adopted. Later, unsupervised, semi-supervised, few-shot, one-shot, and zero-shot learning methods received extensive attention. For the network structure, convolutional neural networks, recurrent neural networks, generative networks, attention mechanisms, and transformers have been designed and widely applied.

Which advancements will eventually be more productive and innovative in this field?

We request contributions presenting techniques (methods, tools, ideas, or even market evaluations) that will contribute to the future roadmap of deep learning, as well as concepts for significantly innovative objectives in image analysis techniques. This Special Issue will discuss the novel supervised, unsupervised, semi-supervised, few-shot, etc., deep learning methods in image analysis, including classification, segmentation, detection, registration, domain generalization, multi-modality fusion, etc. Scientifically founded innovative and speculative research lines are welcome for proposal and evaluation.

Prof. Dr. Yanfeng Li
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • image classification
  • image segmentation
  • object detection
  • image registration
  • image fusion
  • multi-modality
  • domain generalization
  • unsupervised learning
  • semi-supervised learning
  • few-shot learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

30 pages, 3389 KiB  
Article
GCNet: A Deep Learning Framework for Enhanced Grape Cluster Segmentation and Yield Estimation Incorporating Occluded Grape Detection with a Correction Factor for Indoor Experimentation
by Rubi Quiñones, Syeda Mariah Banu and Eren Gultepe
J. Imaging 2025, 11(2), 34; https://doi.org/10.3390/jimaging11020034 - 24 Jan 2025
Viewed by 912
Abstract
Object segmentation algorithms have heavily relied on deep learning techniques to estimate the count of grapes which is a strong indicator for the yield success of grapes. The issue with using object segmentation algorithms for grape analytics is that they are limited to [...] Read more.
Object segmentation algorithms have heavily relied on deep learning techniques to estimate the count of grapes which is a strong indicator for the yield success of grapes. The issue with using object segmentation algorithms for grape analytics is that they are limited to counting only the visible grapes, thus omitting hidden grapes, which affect the true estimate of grape yield. Many grapes are occluded because of either the compactness of the grape bunch cluster or due to canopy interference. This introduces the need for models to be able to estimate the unseen berries to give a more accurate estimate of the grape yield by improving grape cluster segmentation. We propose the Grape Counting Network (GCNet), a novel framework for grape cluster segmentation, integrating deep learning techniques with correction factors to address challenges in indoor yield estimation. GCNet incorporates occlusion adjustments, enhancing segmentation accuracy even under conditions of foliage and cluster compactness, and setting new standards in agricultural indoor imaging analysis. This approach improves yield estimation accuracy, achieving a R² of 0.96 and reducing mean absolute error (MAE) by 10% compared to previous methods. We also propose a new dataset called GrapeSet which contains visible imagery of grape clusters imaged indoors, along with their ground truth mask, total grape count, and weight in grams. The proposed framework aims to encourage future research in determining which features of grapes can be leveraged to estimate the correct grape yield count, equip grape harvesters with the knowledge of early yield estimation, and produce accurate results in object segmentation algorithms for grape analytics. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)
Show Figures

Figure 1

26 pages, 21880 KiB  
Article
Explainable AI-Based Skin Cancer Detection Using CNN, Particle Swarm Optimization and Machine Learning
by Syed Adil Hussain Shah, Syed Taimoor Hussain Shah, Roa’a Khaled, Andrea Buccoliero, Syed Baqir Hussain Shah, Angelo Di Terlizzi, Giacomo Di Benedetto and Marco Agostino Deriu
J. Imaging 2024, 10(12), 332; https://doi.org/10.3390/jimaging10120332 - 22 Dec 2024
Viewed by 1631
Abstract
Skin cancer is among the most prevalent cancers globally, emphasizing the need for early detection and accurate diagnosis to improve outcomes. Traditional diagnostic methods, based on visual examination, are subjective, time-intensive, and require specialized expertise. Current artificial intelligence (AI) approaches for skin cancer [...] Read more.
Skin cancer is among the most prevalent cancers globally, emphasizing the need for early detection and accurate diagnosis to improve outcomes. Traditional diagnostic methods, based on visual examination, are subjective, time-intensive, and require specialized expertise. Current artificial intelligence (AI) approaches for skin cancer detection face challenges such as computational inefficiency, lack of interpretability, and reliance on standalone CNN architectures. To address these limitations, this study proposes a comprehensive pipeline combining transfer learning, feature selection, and machine-learning algorithms to improve detection accuracy. Multiple pretrained CNN models were evaluated, with Xception emerging as the optimal choice for its balance of computational efficiency and performance. An ablation study further validated the effectiveness of freezing task-specific layers within the Xception architecture. Feature dimensionality was optimized using Particle Swarm Optimization, reducing dimensions from 1024 to 508, significantly enhancing computational efficiency. Machine-learning classifiers, including Subspace KNN and Medium Gaussian SVM, further improved classification accuracy. Evaluated on the ISIC 2018 and HAM10000 datasets, the proposed pipeline achieved impressive accuracies of 98.5% and 86.1%, respectively. Moreover, Explainable-AI (XAI) techniques, such as Grad-CAM, LIME, and Occlusion Sensitivity, enhanced interpretability. This approach provides a robust, efficient, and interpretable solution for automated skin cancer diagnosis in clinical applications. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)
Show Figures

Figure 1

22 pages, 18461 KiB  
Article
Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
by Tianwei Chen, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima and Hajime Nagahara
J. Imaging 2024, 10(12), 300; https://doi.org/10.3390/jimaging10120300 - 22 Nov 2024
Viewed by 744
Abstract
Is learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However, we show [...] Read more.
Is learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However, we show that not all knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conducted an exhaustive analysis based on hundreds of cross-experiments on twelve vision-and-language tasks categorized into four groups. While tasks in the same group are prone to improve each other, results show that this is not always the case. In addition, other factors, such as dataset size or the pre-training stage, may have a great impact on how well the knowledge is transferred. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)
Show Figures

Figure 1

20 pages, 12767 KiB  
Article
A Real-Time End-to-End Framework with a Stacked Model Using Ultrasound Video for Cardiac Septal Defect Decision-Making
by Siti Nurmaini, Ria Nova, Ade Iriani Sapitri, Muhammad Naufal Rachmatullah, Bambang Tutuko, Firdaus Firdaus, Annisa Darmawahyuni, Anggun Islami, Satria Mandala, Radiyati Umi Partan, Akhiar Wista Arum and Rio Bastian
J. Imaging 2024, 10(11), 280; https://doi.org/10.3390/jimaging10110280 - 3 Nov 2024
Viewed by 1420
Abstract
Echocardiography is the gold standard for the comprehensive diagnosis of cardiac septal defects (CSDs). Currently, echocardiography diagnosis is primarily based on expert observation, which is laborious and time-consuming. With digitization, deep learning (DL) can be used to improve the efficiency of the diagnosis. [...] Read more.
Echocardiography is the gold standard for the comprehensive diagnosis of cardiac septal defects (CSDs). Currently, echocardiography diagnosis is primarily based on expert observation, which is laborious and time-consuming. With digitization, deep learning (DL) can be used to improve the efficiency of the diagnosis. This study presents a real-time end-to-end framework tailored for pediatric ultrasound video analysis for CSD decision-making. The framework employs an advanced real-time architecture based on You Only Look Once (Yolo) techniques for CSD decision-making with high accuracy. Leveraging the state of the art with the Yolov8l (large) architecture, the proposed model achieves a robust performance in real-time processes. It can be observed that the experiment yielded a mean average precision (mAP) exceeding 89%, indicating the framework’s effectiveness in accurately diagnosing CSDs from ultrasound (US) videos. The Yolov8l model exhibits precise performance in the real-time testing of pediatric patients from Mohammad Hoesin General Hospital in Palembang, Indonesia. Based on the results of the proposed model using 222 US videos, it exhibits 95.86% accuracy, 96.82% sensitivity, and 98.74% specificity. During real-time testing in the hospital, the model exhibits a 97.17% accuracy, 95.80% sensitivity, and 98.15% specificity; only 3 out of the 53 US videos in the real-time process were diagnosed incorrectly. This comprehensive approach holds promise for enhancing clinical decision-making and improving patient outcomes in pediatric cardiology. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)
Show Figures

Figure 1

22 pages, 4136 KiB  
Article
DepthCrackNet: A Deep Learning Model for Automatic Pavement Crack Detection
by Alireza Saberironaghi and Jing Ren
J. Imaging 2024, 10(5), 100; https://doi.org/10.3390/jimaging10050100 - 26 Apr 2024
Cited by 3 | Viewed by 2797
Abstract
Detecting cracks in the pavement is a vital component of ensuring road safety. Since manual identification of these cracks can be time-consuming, an automated method is needed to speed up this process. However, creating such a system is challenging due to factors including [...] Read more.
Detecting cracks in the pavement is a vital component of ensuring road safety. Since manual identification of these cracks can be time-consuming, an automated method is needed to speed up this process. However, creating such a system is challenging due to factors including crack variability, variations in pavement materials, and the occurrence of miscellaneous objects and anomalies on the pavement. Motivated by the latest progress in deep learning applied to computer vision, we propose an effective U-Net-shaped model named DepthCrackNet. Our model employs the Double Convolution Encoder (DCE), composed of a sequence of convolution layers, for robust feature extraction while keeping parameters optimally efficient. We have incorporated the TriInput Multi-Head Spatial Attention (TMSA) module into our model; in this module, each head operates independently, capturing various spatial relationships and boosting the extraction of rich contextual information. Furthermore, DepthCrackNet employs the Spatial Depth Enhancer (SDE) module, specifically designed to augment the feature extraction capabilities of our segmentation model. The performance of the DepthCrackNet was evaluated on two public crack datasets: Crack500 and DeepCrack. In our experimental studies, the network achieved mIoU scores of 77.0% and 83.9% with the Crack500 and DeepCrack datasets, respectively. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)
Show Figures

Figure 1

Review

Jump to: Research

15 pages, 627 KiB  
Review
Real-Time Emotion Recognition for Improving the Teaching–Learning Process: A Scoping Review
by Cèlia Llurba and Ramon Palau
J. Imaging 2024, 10(12), 313; https://doi.org/10.3390/jimaging10120313 - 9 Dec 2024
Viewed by 1133
Abstract
Emotion recognition (ER) is gaining popularity in various fields, including education. The benefits of ER in the classroom for educational purposes, such as improving students’ academic performance, are gradually becoming known. Thus, real-time ER is proving to be a valuable tool for teachers [...] Read more.
Emotion recognition (ER) is gaining popularity in various fields, including education. The benefits of ER in the classroom for educational purposes, such as improving students’ academic performance, are gradually becoming known. Thus, real-time ER is proving to be a valuable tool for teachers as well as for students. However, its feasibility in educational settings requires further exploration. This review offers learning experiences based on real-time ER with students to explore their potential in learning and in improving their academic achievement. The purpose is to present evidence of good implementation and suggestions for their successful application. The content analysis finds that most of the practices lead to significant improvements in terms of educational purposes. Nevertheless, the analysis identifies problems that might block the implementation of these practices in the classroom and in education; among the obstacles identified are the absence of privacy of the students and the support needs of the students. We conclude that artificial intelligence (AI) and ER are potential tools to approach the needs in ordinary classrooms, although reliable automatic recognition is still a challenge for researchers to achieve the best ER feature in real time, given the high input data variability. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)
Show Figures

Figure 1

Back to TopTop