Modern Computer Vision and Image Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 May 2022) | Viewed by 54454

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Engineering, Keimyung University, Daegu 704-701, Republic of Korea
Interests: camera calibration; computer vision; image processing; signal processing
Special Issues, Collections and Topics in MDPI journals
Next-generation Information Security Laboratory(NISL), College of Engineering, Keimyung University, Daegu, 24601, Republic of Korea
Interests: network security; security of IoT; blockchain; post-quantum cryptography; security of VANETs; formal analysis
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We have the pleasure to invite you to submit a paper to the Special Issue titled “Modern Computer vision and Image Processing” devoted to Applied Science. This Special Issue aims at collecting the latest reviews and full-length research in the areas of computer vision and image processing. In particular, this Special Issue focuses on both low/high level technologies of computer vision and image processing that are based on geometric analysis, visual information, appearance-based algorithms, machine learning, deep learning, and other approaches that have recently been gaining attentions .

Topics of interest include, but are not limited to, the following:

  • 2D/3D or higher dimensional image processing and analysis
  • Computer vision (camera calibration, feature matching, feature extraction, disparity/depth estimation)
  • Multi-modality-based image processing and computer vision
  • Depth sensor-based approaches
  • Recognition/classification
  • Video processing and analysis
  • Machine learning/deep learning-based approaches
  • Virtual and augmented reality (VR/AR) applications
  • Graphics
  • Applications in vehicles, robotics, cameras, and artificial intelligence (AI)

Dr. Deokwoo Lee
Dr. YoHan Park
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • image processing
  • pattern recognition
  • machine learning
  • deep learning
  • artificial intelligence

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 5564 KiB  
Article
Real-Time 3D Reconstruction Method for Holographic Telepresence
by Fazliaty Edora Fadzli, Ajune Wanis Ismail, Shafina Abd Karim Ishigaki, Muhammad Nur Affendy Nor’a and Mohamad Yahya Fekri Aladin
Appl. Sci. 2022, 12(8), 4009; https://doi.org/10.3390/app12084009 - 15 Apr 2022
Cited by 7 | Viewed by 2974
Abstract
This paper introduces a real-time 3D reconstruction of a human captured using a depth sensor and has integrated it with a holographic telepresence application. Holographic projection is widely recognized as one of the most promising 3D display technologies, and it is expected to [...] Read more.
This paper introduces a real-time 3D reconstruction of a human captured using a depth sensor and has integrated it with a holographic telepresence application. Holographic projection is widely recognized as one of the most promising 3D display technologies, and it is expected to become more widely available in the near future. This technology may also be deployed in various ways, including holographic prisms and Z-Hologram, which this research has used to demonstrate the initial results by displaying the reconstructed 3D representation of the user. The realization of a stable and inexpensive 3D data acquisition system is a problem that has yet to be solved. When we involve multiple sensors we need to compress and optimize the data so that it can be sent to a server for a telepresence. Therefore the paper presents the processes in real-time 3D reconstruction, which consists of data acquisition, background removal, point cloud extraction, and a surface generation which applies a marching cube algorithm to finally form an isosurface from the set of points in the point cloud which later texture mapping is applied on the isosurface generated. The compression results has been presented in this paper, and the results of the integration process after sending the data over the network also have been discussed. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Processing)
Show Figures

Figure 1

18 pages, 3672 KiB  
Article
Multi-Modal Long-Term Person Re-Identification Using Physical Soft Bio-Metrics and Body Figure
by Nadeen Shoukry, Mohamed A. Abd El Ghany and Mohammed A.-M. Salem
Appl. Sci. 2022, 12(6), 2835; https://doi.org/10.3390/app12062835 - 10 Mar 2022
Cited by 3 | Viewed by 2420
Abstract
Person re-identification is the task of recognizing a subject across different non-overlapping cameras across different views and times. Most state-of-the-art datasets and proposed solutions tend to address the problem of short-term re-identification. Those models can re-identify a person as long as they are [...] Read more.
Person re-identification is the task of recognizing a subject across different non-overlapping cameras across different views and times. Most state-of-the-art datasets and proposed solutions tend to address the problem of short-term re-identification. Those models can re-identify a person as long as they are wearing the same clothes. The work presented in this paper addresses the task of long-term re-identification. Therefore, the proposed model is trained on a dataset that incorporates clothes variation. This paper proposes a multi-modal person re-identification model. The first modality includes soft bio-metrics: hair, face, neck, shoulders, and part of the chest. The second modality is the remaining body figure that mainly focuses on clothes. The proposed model is composed of two separate neural networks, one for each modality. For the first modality, a two-stream Siamese network with pre-trained FaceNet as a feature extractor for the first modality is utilized. Part-based Convolutional Baseline classifier with a feature extractor network OSNet for the second modality. Experiments confirm that the proposed model can outperform several state-of-the-art models achieving 81.4 % accuracy on Rank-1, 82.3% accuracy on Rank-5, 83.1% accuracy on Rank-10, and 83.7% accuracy on Rank-20. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Processing)
Show Figures

Figure 1

23 pages, 5731 KiB  
Article
Improved MSRN-Based Attention Block for Mask Alignment Mark Detection in Photolithography
by Juyong Park and Jongpil Jeong
Appl. Sci. 2022, 12(5), 2721; https://doi.org/10.3390/app12052721 - 06 Mar 2022
Viewed by 4114
Abstract
Wafer chips are manufactured in the semiconductor industry through various process technologies. Photolithography is one of these processes, aligning the wafer and scanning the circuit pattern on the wafer on which the photoresist film is formed by irradiating light onto the circuit pattern [...] Read more.
Wafer chips are manufactured in the semiconductor industry through various process technologies. Photolithography is one of these processes, aligning the wafer and scanning the circuit pattern on the wafer on which the photoresist film is formed by irradiating light onto the circuit pattern drawn on the mask. As semiconductor technology is highly integrated, alignment is becoming increasingly difficult due to problems such as reduction of alignment margin, transmittance due to level stacking structure, and an increase in wafer diameter in the photolithography process. Various methods and research to reduce the misalignment problem that is directly related to the yield of production are constantly being conducted. In this paper, we use machine vision for exposure equipment to improve the image resolution quality of marks for accurate alignment. To improve image resolution quality, we propose an improved Multi-Scale Residual Network (MSRN) that combines Attention Mechanism using a Multi-Scale Residual Attention Block to improve image resolution quality. Our proposed method can extract enhanced features using two different bypass networks and attention blocks with different scale convolution filters. Experiments were used to verify this method, and the performance was improved compared with previous research. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Processing)
Show Figures

Figure 1

17 pages, 1592 KiB  
Article
Classification of Respiratory States Using Spectrogram with Convolutional Neural Network
by Cheolhyeong Park and Deokwoo Lee
Appl. Sci. 2022, 12(4), 1895; https://doi.org/10.3390/app12041895 - 11 Feb 2022
Cited by 5 | Viewed by 2236
Abstract
This paper proposes an approach to the classification of respiration states based on a neural network model by visualizing respiratory signals using a spectrogram. The analysis and processing of human biosignals are still considered some of the most crucial and fundamental research areas [...] Read more.
This paper proposes an approach to the classification of respiration states based on a neural network model by visualizing respiratory signals using a spectrogram. The analysis and processing of human biosignals are still considered some of the most crucial and fundamental research areas in both signal processing and medical applications. Recently, learning-based algorithms in signal and image processing for medical applications have shown significant improvement from both quantitative and qualitative perspectives. Human respiration is still considered an important factor for diagnosis, and it plays a key role in preventing fatal diseases in practice. This paper chiefly deals with a contactless-based approach for the acquisition of respiration data using an ultra-wideband (UWB) radar sensor because it is simple and easy for use in an experimental setup and shows high accuracy in distance estimation. This paper proposes the classification of respiratory states by using a feature visualization scheme, a spectrogram, and a neural network model. The proposed method shows competitive and promising results in the classification of respiratory states. The experimental results also show that the method provides better accuracy (precision: 0.86 and specificity: 0.90) than conventional methods that use expensive equipment for respiration measurement. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Processing)
Show Figures

Figure 1

13 pages, 833 KiB  
Article
Dist-YOLO: Fast Object Detection with Distance Estimation
by Marek Vajgl, Petr Hurtik and Tomáš Nejezchleba
Appl. Sci. 2022, 12(3), 1354; https://doi.org/10.3390/app12031354 - 27 Jan 2022
Cited by 38 | Viewed by 27190
Abstract
We present a scheme of how YOLO can be improved in order to predict the absolute distance of objects using only information from a monocular camera. It is fully integrated into the original architecture by extending the prediction vectors, sharing the backbone’s weights [...] Read more.
We present a scheme of how YOLO can be improved in order to predict the absolute distance of objects using only information from a monocular camera. It is fully integrated into the original architecture by extending the prediction vectors, sharing the backbone’s weights with the bounding box regressor, and updating the original loss function by a part responsible for distance estimation. We designed two ways of handling the distance, class-agnostic and class-aware, proving class-agnostic creates smaller prediction vectors than class-aware and achieves better results. We demonstrate that the subtasks of object detection and distance measurement are in synergy, resulting in the increase of the precision of the original bounding box functionality. We show that using the KITTI dataset, the proposed scheme yields a mean relative error of 11% considering all eight classes and the distance range within [0, 150] m, which makes the solution highly competitive with existing approaches. Finally, we show that the inference speed is identical to the unmodified YOLO, 45 frames per second. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Processing)
Show Figures

Figure 1

20 pages, 4474 KiB  
Article
GSV-NET: A Multi-Modal Deep Learning Network for 3D Point Cloud Classification
by Long Hoang, Suk-Hwan Lee, Eung-Joo Lee and Ki-Ryong Kwon
Appl. Sci. 2022, 12(1), 483; https://doi.org/10.3390/app12010483 - 04 Jan 2022
Cited by 8 | Viewed by 2953
Abstract
Light Detection and Ranging (LiDAR), which applies light in the formation of a pulsed laser to estimate the distance between the LiDAR sensor and objects, is an effective remote sensing technology. Many applications use LiDAR including autonomous vehicles, robotics, and virtual and augmented [...] Read more.
Light Detection and Ranging (LiDAR), which applies light in the formation of a pulsed laser to estimate the distance between the LiDAR sensor and objects, is an effective remote sensing technology. Many applications use LiDAR including autonomous vehicles, robotics, and virtual and augmented reality (VR/AR). The 3D point cloud classification is now a hot research topic with the evolution of LiDAR technology. This research aims to provide a high performance and compatible real-world data method for 3D point cloud classification. More specifically, we introduce a novel framework for 3D point cloud classification, namely, GSV-NET, which uses Gaussian Supervector and enhancing region representation. GSV-NET extracts and combines both global and regional features of the 3D point cloud to further enhance the information of the point cloud features for the 3D point cloud classification. Firstly, we input the Gaussian Supervector description into a 3D wide-inception convolution neural network (CNN) structure to define the global feature. Secondly, we convert the regions of the 3D point cloud into color representation and capture region features with a 2D wide-inception network. These extracted features are inputs of a 1D CNN architecture. We evaluate the proposed framework on the point cloud dataset: ModelNet and the LiDAR dataset: Sydney. The ModelNet dataset was developed by Princeton University (New Jersey, United States), while the Sydney dataset was created by the University of Sydney (Sydney, Australia). Based on our numerical results, our framework achieves more accuracy than the state-of-the-art approaches. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Processing)
Show Figures

Figure 1

11 pages, 1656 KiB  
Article
Fast Drivable Areas Estimation with Multi-Task Learning for Real-Time Autonomous Driving Assistant
by Dong-Gyu Lee
Appl. Sci. 2021, 11(22), 10713; https://doi.org/10.3390/app112210713 - 13 Nov 2021
Cited by 13 | Viewed by 3323
Abstract
Autonomous driving is a safety-critical application that requires a high-level understanding of computer vision with real-time inference. In this study, we focus on the computational efficiency of an important factor by improving the running time and performing multiple tasks simultaneously for practical applications. [...] Read more.
Autonomous driving is a safety-critical application that requires a high-level understanding of computer vision with real-time inference. In this study, we focus on the computational efficiency of an important factor by improving the running time and performing multiple tasks simultaneously for practical applications. We propose a fast and accurate multi-task learning-based architecture for joint segmentation of drivable area, lane line, and classification of the scene. An encoder–decoder architecture efficiently handles input frames through shared representation. A comprehensive understanding of the driving environment is improved by generalization and regularization from different tasks. The proposed method learns end-to-end through multi-task learning on a very challenging Berkeley Deep Drive dataset and shows its robustness for three tasks in autonomous driving. Experimental results show that the proposed method outperforms other multi-task learning approaches in both speed and accuracy. The computational efficiency of the method was over 93.81 fps at inference, enabling execution in real-time. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Processing)
Show Figures

Figure 1

16 pages, 19140 KiB  
Article
Line Drawing Extraction from Cartoons Using a Conditional Generative Adversarial Network
by Kyungho Yu, Juhyeon Noh and Hee-Deok Yang
Appl. Sci. 2021, 11(16), 7536; https://doi.org/10.3390/app11167536 - 17 Aug 2021
Viewed by 2273
Abstract
Recently, three-dimensional (3D) content used in various fields has attracted attention owing to the development of virtual reality and augmented reality technologies. To produce 3D content, we need to model the objects as vertices. However, high-quality modeling is time-consuming and costly. Drawing-based modeling [...] Read more.
Recently, three-dimensional (3D) content used in various fields has attracted attention owing to the development of virtual reality and augmented reality technologies. To produce 3D content, we need to model the objects as vertices. However, high-quality modeling is time-consuming and costly. Drawing-based modeling is a technique that shortens the time required for modeling. It refers to creating a 3D model based on a user’s line drawing, which is a 3D feature represented by two-dimensional (2D) lines. The extracted line drawing provides information about a 3D model in the 2D space. It is sometimes necessary to generate a line drawing from a 2D cartoon image to represent the 3D information of a 2D cartoon image. The extraction of consistent line drawings from 2D cartoons is difficult because the styles and techniques differ depending on the designer who produces the 2D cartoons. Therefore, it is necessary to extract line drawings that show the geometric characteristics well in 2D cartoon shapes of various styles. This paper proposes a method for automatically extracting line drawings. The 2D cartoon shading image and line drawings are learned using a conditional generative adversarial network model, which outputs the line drawings of the cartoon artwork. The experimental results show that the proposed method can obtain line drawings representing the 3D geometric characteristics with a 2D line when a 2D cartoon painting is used as the input. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Processing)
Show Figures

Figure 1

0 pages, 7034 KiB  
Article
Restoring Raindrops Using Attentive Generative Adversarial Networks
by Suhan Goo and Hee-Deok Yang
Appl. Sci. 2021, 11(15), 7034; https://doi.org/10.3390/app11157034 - 30 Jul 2021
Cited by 5 | Viewed by 1834 | Correction
Abstract
Artificial intelligence technologies and vision systems are used in various devices, such as automotive navigation systems, object-tracking systems, and intelligent closed-circuit televisions. In particular, outdoor vision systems have been applied across numerous fields of analysis. Despite their widespread use, current systems work well [...] Read more.
Artificial intelligence technologies and vision systems are used in various devices, such as automotive navigation systems, object-tracking systems, and intelligent closed-circuit televisions. In particular, outdoor vision systems have been applied across numerous fields of analysis. Despite their widespread use, current systems work well under good weather conditions. They cannot account for inclement conditions, such as rain, fog, mist, and snow. Images captured under inclement conditions degrade the performance of vision systems. Vision systems need to detect, recognize, and remove noise because of rain, snow, and mist to boost the performance of the algorithms employed in image processing. Several studies have targeted the removal of noise resulting from inclement conditions. We focused on eliminating the effects of raindrops on images captured with outdoor vision systems in which the camera was exposed to rain. An attentive generative adversarial network (ATTGAN) was used to remove raindrops from the images. This network was composed of two parts: an attentive-recurrent network and a contextual autoencoder. The ATTGAN generated an attention map to detect rain droplets. A de-rained image was generated by increasing the number of attentive-recurrent network layers. We increased the number of visual attentive-recurrent network layers in order to prevent gradient sparsity so that the entire generation was more stable against the network without preventing the network from converging. The experimental results confirmed that the extended ATTGAN could effectively remove various types of raindrops from images. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Processing)
Show Figures

Figure 1

13 pages, 3004 KiB  
Article
A Novel Method for Intrinsic and Extrinsic Parameters Estimation by Solving Perspective-Three-Point Problem with Known Camera Position
by Kai Guo, Hu Ye, Junhao Gu and Honglin Chen
Appl. Sci. 2021, 11(13), 6014; https://doi.org/10.3390/app11136014 - 28 Jun 2021
Cited by 13 | Viewed by 3034
Abstract
The aim of the perspective-three-point (P3P) problem is to estimate extrinsic parameters of a camera from three 2D–3D point correspondences, including the orientation and position information. All the P3P solvers have a multi-solution phenomenon that is up to four solutions and needs a [...] Read more.
The aim of the perspective-three-point (P3P) problem is to estimate extrinsic parameters of a camera from three 2D–3D point correspondences, including the orientation and position information. All the P3P solvers have a multi-solution phenomenon that is up to four solutions and needs a fully calibrated camera. In contrast, in this paper we propose a novel method for intrinsic and extrinsic parameter estimation based on three 2D–3D point correspondences with known camera position. Our core contribution is to build a new, virtual camera system whose frame and image plane are defined by the original 3D points, to build a new, intermediate world frame by the original image plane and the original 2D image points, and convert our problem to a P3P problem. Then, the intrinsic and extrinsic parameter estimation is to solve frame transformation and the P3P problem. Lastly, we solve the multi-solution problem by image resolution. Experimental results show its accuracy, numerical stability and uniqueness of the solution for intrinsic and extrinsic parameter estimation in synthetic data and real images. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Processing)
Show Figures

Figure 1

Back to TopTop