Submit to J. Imaging Review for J. Imaging Propose a Special Issue

Journal Menu

Journal Browser

Deep Learning in Computer Vision

Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Journal of Imaging (ISSN 2313-433X). This special issue belongs to the section "Computer Vision and Pattern Recognition".

Deadline for manuscript submissions: closed (15 September 2024) | Viewed by 16665

Share This Special Issue

Special Issue Editors

Dr. Dong Zhang

E-Mail Website
Guest Editor

Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong 999077, China
Interests: image classification; object detection; semantic segmentation; pose estimation
Special Issues, Collections and Topics in MDPI journals

Dr. Rui Yan

E-Mail Website
Guest Editor

Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China
Interests: machine learning; computer vision
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The field of computer vision has undergone a significant transformation with the advent of deep learning techniques, enabling the development of innovative applications across various domains. This Special Issue focuses on exploring the latest advancements, methodologies, and applications in this rapidly evolving area.

Deep learning methods, such as convolutional neural networks (CNNs), vision transformer (ViT), diffusion models and generative adversarial networks (GANs), have demonstrated remarkable success in tasks such as object recognition, semantic segmentation and image synthesis. These techniques have paved the way for a myriad of applications, including autonomous vehicles, facial recognition, biomedical image analysis and video surveillance, among others.

We invite contributions that present cutting-edge research, novel techniques, methods, tools and ideas related to the integration of deep learning in computer vision. Submissions may cover a wide range of topics, including, but not limited to:

Advances in deep learning architectures for computer vision tasks;
Transfer learning and domain adaptation in computer vision;
Deep reinforcement learning for vision-based control;
Generative models for image synthesis and manipulation;
Application of computer vision technology in biomedical imaging;
Applications of deep learning in fields such as remote sensing, robotics and art.

We encourage submissions that propose innovative and scientifically grounded research lines for the future development of deep learning techniques in computer vision. Together, we aim to advance the field of computer vision and its applications in various industries.

Dr. Dong Zhang
Dr. Rui Yan
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

image classification
object detection
semantic segmentation
pose estimation
multimedia analysis and retrieval
few-shot learning
human behavior understanding
video language understanding
video understanding and analysis

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

12 pages, 3874 KB

Open AccessArticle

Anatomical Characteristics of Cervicomedullary Compression on MRI Scans in Children with Achondroplasia

by Isabella Trautwein, Daniel Behme, Philip Kunkel, Jasper Gerdes and Klaus Mohnike

J. Imaging 2024, 10(11), 291; https://doi.org/10.3390/jimaging10110291 - 14 Nov 2024

Viewed by 1722

Abstract

This retrospective study assessed anatomical characteristics of cervicomedullary compression in children with achondroplasia. Twelve anatomical parameters were analyzed (foramen magnum diameter and area; myelon area; clivus length; tentorium and occipital angles; brainstem volume outside the posterior fossa; and posterior fossa, cerebellum, supratentorial ventricular system, intracranial cerebrospinal fluid, and fourth ventricle volumes) from sagittal and transversal T1- and T2-weighted magnetic resonance imaging (MRI) scans from 37 children with achondroplasia aged ≤ 4 years (median [range] 0.8 [0.1–3.6] years) and compared with scans from 37 children without achondroplasia (median age 1.5 [0–3.9] years). Mann–Whitney U testing was used for between-group comparisons. Foramen magnum diameter and area were significantly smaller in children with achondroplasia compared with the reference group (mean 10.0 vs. 16.1 mm [p < 0.001] and 109.0 vs. 160.8 mm² [p = 0.005], respectively). The tentorial angle was also steeper in children with achondroplasia (mean 47.6 vs. 38.1 degrees; p < 0.001), while the clivus was significantly shorter (mean 23.5 vs. 30.3 mm; p < 0.001). Significant differences were also observed in myelon area, occipital angle, fourth ventricle, intracranial cerebrospinal fluid and supratentorial ventricular volumes, and the volume of brainstem protruding beyond the posterior fossa (all p < 0.05). MRI analysis of brain structures may provide a standardized value to indicate decompression surgery in children with achondroplasia. Full article

(This article belongs to the Special Issue Deep Learning in Computer Vision)

► Show Figures

Figure 1

15 pages, 8542 KB

Open AccessArticle

The Adversarial Robust and Generalizable Stereo Matching for Infrared Binocular Based on Deep Learning

by Bowen Liu, Jiawei Ji, Cancan Tao, Jujiu Li and Yingxun Wang

J. Imaging 2024, 10(11), 264; https://doi.org/10.3390/jimaging10110264 - 22 Oct 2024

Viewed by 1400

Abstract

Despite the considerable success of deep learning methods in stereo matching for binocular images, the generalizability and robustness of these algorithms, particularly under challenging conditions such as occlusions or degraded infrared textures, remain uncertain. This paper presents a novel deep-learning-based depth optimization method that obviates the need for large infrared image datasets and adapts seamlessly to any specific infrared camera. Moreover, this adaptability extends to standard binocular images, allowing the method to work effectively on both infrared and visible light stereo images. We further investigate the role of infrared textures in a deep learning framework, demonstrating their continued utility for stereo matching even in complex lighting environments. To compute the matching cost volume, we apply the multi-scale census transform to the input stereo images. A stacked sand leak subnetwork is subsequently employed to address the matching task. Our approach substantially improves adversarial robustness while maintaining accuracy on comparison with state-of-the-art methods which decrease nearly a half in EPE for quantitative results on widely used autonomous driving datasets. Furthermore, the proposed method exhibits superior generalization capabilities, transitioning from simulated datasets to real-world datasets without the need for fine-tuning. Full article

(This article belongs to the Special Issue Deep Learning in Computer Vision)

► Show Figures

Figure 1

20 pages, 5653 KB

Open AccessArticle

Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization

by Zongshang Pang, Yuta Nakashima, Mayu Otani and Hajime Nagahara

J. Imaging 2024, 10(9), 229; https://doi.org/10.3390/jimaging10090229 - 14 Sep 2024

Cited by 1 | Viewed by 1717

Abstract

Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing. Past efforts have invariantly involved training summarization models with annotated summaries or heuristic objectives. In this work, we reveal that features pre-trained on image-level tasks contain rich semantic information that can be readily leveraged to quantify frame-level importance for zero-shot video summarization. Leveraging pre-trained features and contrastive learning, we propose three metrics featuring a desirable keyframe: local dissimilarity, global consistency, and uniqueness. We show that the metrics can well-capture the diversity and representativeness of frames commonly used for the unsupervised generation of video summaries, demonstrating competitive or better performance compared to past methods when no training is needed. We further propose a contrastive learning-based pre-training strategy on unlabeled videos to enhance the quality of the proposed metrics and, thus, improve the evaluated performance on the public benchmarks TVSum and SumMe. Full article

(This article belongs to the Special Issue Deep Learning in Computer Vision)

► Show Figures

Figure 1

12 pages, 8025 KB

Open AccessArticle

Deep Learning for Single-Shot Structured Light Profilometry: A Comprehensive Dataset and Performance Analysis

by Rhys G. Evans, Ester Devlieghere, Robrecht Keijzer, Joris J. J. Dirckx and Sam Van der Jeught

J. Imaging 2024, 10(8), 179; https://doi.org/10.3390/jimaging10080179 - 24 Jul 2024

Cited by 1 | Viewed by 2417

Abstract

In 3D optical metrology, single-shot deep learning-based structured light profilometry (SS-DL-SLP) has gained attention because of its measurement speed, simplicity of optical setup, and robustness to noise and motion artefacts. However, gathering a sufficiently large training dataset for these techniques remains challenging because of practical limitations. This paper presents a comprehensive DL-SLP dataset of over 10,000 physical data couples. The dataset was constructed by 3D-printing a calibration target featuring randomly varying surface profiles and storing the height profiles and the corresponding deformed fringe patterns. Our dataset aims to serve as a benchmark for evaluating and comparing different models and network architectures in DL-SLP. We performed an analysis of several established neural networks, demonstrating high accuracy in obtaining full-field height information from previously unseen fringe patterns. In addition, the network was validated on unique objects to test the overall robustness of the trained model. To facilitate further research and promote reproducibility, all code and the dataset are made publicly available. This dataset will enable researchers to explore, develop, and benchmark novel DL-based approaches for SS-DL-SLP. Full article

(This article belongs to the Special Issue Deep Learning in Computer Vision)

► Show Figures

Figure 1

15 pages, 1666 KB

Open AccessArticle

MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation

by Nikolaos Detsikas, Nikolaos Mitianoudis and Ioannis Pratikakis

J. Imaging 2024, 10(6), 125; https://doi.org/10.3390/jimaging10060125 - 21 May 2024

Cited by 5 | Viewed by 2621

Abstract

A fundamental task in computer vision is the process of differentiation and identification of different objects or entities in a visual scene using semantic segmentation methods. The advancement of transformer networks has surpassed traditional convolutional neural network (CNN) architectures in terms of segmentation performance. The continuous pursuit of optimal performance, with respect to the popular evaluation metric results, has led to very large architectures that require a significant amount of computational power to operate, making them prohibitive for real-time applications, including autonomous driving. In this paper, we propose a model that leverages a visual transformer encoder with a parallel twin decoder, consisting of a visual transformer decoder and a CNN decoder with multi-resolution connections working in parallel. The two decoders are merged with the aid of two trainable CNN blocks, the fuser that combined the information from the two decoders and the scaler that scales the contribution of each decoder. The proposed model achieves state-of-the-art performance on the Cityscapes and ADE20K datasets, maintaining a low-complexity network that can be used in real-time applications. Full article

(This article belongs to the Special Issue Deep Learning in Computer Vision)

► Show Figures

Figure 1

18 pages, 1810 KB

Open AccessArticle

Knowledge Distillation in Video-Based Human Action Recognition: An Intuitive Approach to Efficient and Flexible Model Training

by Fernando Camarena, Miguel Gonzalez-Mendoza and Leonardo Chang

J. Imaging 2024, 10(4), 85; https://doi.org/10.3390/jimaging10040085 - 30 Mar 2024

Cited by 2 | Viewed by 2534

Abstract

Training a model to recognize human actions in videos is computationally intensive. While modern strategies employ transfer learning methods to make the process more efficient, they still face challenges regarding flexibility and efficiency. Existing solutions are limited in functionality and rely heavily on pretrained architectures, which can restrict their applicability to diverse scenarios. Our work explores knowledge distillation (KD) for enhancing the training of self-supervised video models in three aspects: improving classification accuracy, accelerating model convergence, and increasing model flexibility under regular and limited-data scenarios. We tested our method on the UCF101 dataset using differently balanced proportions: 100%, 50%, 25%, and 2%. We found that using knowledge distillation to guide the model’s training outperforms traditional training without affecting the classification accuracy and while reducing the convergence rate of model training in standard settings and a data-scarce environment. Additionally, knowledge distillation enables cross-architecture flexibility, allowing model customization for various applications: from resource-limited to high-performance scenarios. Full article

(This article belongs to the Special Issue Deep Learning in Computer Vision)

► Show Figures

Figure 1

15 pages, 980 KB

Open AccessArticle

FishSegSSL: A Semi-Supervised Semantic Segmentation Framework for Fish-Eye Images

by Sneha Paul, Zachary Patterson and Nizar Bouguila

J. Imaging 2024, 10(3), 71; https://doi.org/10.3390/jimaging10030071 - 15 Mar 2024

Cited by 2 | Viewed by 2927

Abstract

The application of large field-of-view (FoV) cameras equipped with fish-eye lenses brings notable advantages to various real-world computer vision applications, including autonomous driving. While deep learning has proven successful in conventional computer vision applications using regular perspective images, its potential in fish-eye camera contexts remains largely unexplored due to limited datasets for fully supervised learning. Semi-supervised learning comes as a potential solution to manage this challenge. In this study, we explore and benchmark two popular semi-supervised methods from the perspective image domain for fish-eye image segmentation. We further introduce FishSegSSL, a novel fish-eye image segmentation framework featuring three semi-supervised components: pseudo-label filtering, dynamic confidence thresholding, and robust strong augmentation. Evaluation on the WoodScape dataset, collected from vehicle-mounted fish-eye cameras, demonstrates that our proposed method enhances the model’s performance by up to 10.49% over fully supervised methods using the same amount of labeled data. Our method also improves the existing image segmentation methods by 2.34%. To the best of our knowledge, this is the first work on semi-supervised semantic segmentation on fish-eye images. Additionally, we conduct a comprehensive ablation study and sensitivity analysis to showcase the efficacy of each proposed method in this research. Full article

(This article belongs to the Special Issue Deep Learning in Computer Vision)

► Show Figures

Journal Menu

Journal Browser

Deep Learning in Computer Vision

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (7 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI