Digital Image Processing and Analysis: Human and Computer Vision Applications

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 September 2022) | Viewed by 55950

Special Issue Editor


E-Mail Website
Guest Editor
Department of Informatics, Computer and Telecommunications Engineering, International Hellenic University, Terma Magnesias Str., 62124 Serres, Greece
Interests: multimedia systems; digital image processing; digital signal processing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In the field of computer science, digital image processing and analysis has developed rapidly, especially during the last three decades, and has a wide variety of applications, ranging from medicine, astronomy, microscopy and defense to biology, industry, robotics and remote sensing. Nowadays, one of the most prominent fields that adopts and benefits from digital image processing and analysis is computer vision.

Computer vision covers methods and algorithms which mimic the behavior of human vision. It attempts to extract useful and meaningful information from images and video, which can afterwards be used to draw conclusions about the natural scene and, perhaps, decide on action to be taken.

Digital image processing and analysis are utilized at various stages in computer vision algorithms and it is therefore crucial to develop and implement efficient image processing and analysis algorithms.

Submissions focusing on, but not limited to, the following topics are encouraged:

  • Image restoration and enhancement
  • Image compression
  • Edge detection
  • Image segmentation
  • Semantic segmentation
  • Image classification
  • Image inpainting
  • Image captioning
  • Feature detection and extraction
  • Object recognition
  • Content-based image retrieval
  • Optical character recognition
  • Face recognition
  • Emotion recognition
  • Gesture recognition
  • Egomotion
  • Object tracking
  • Trajectory prediction
  • 3D scene reconstruction
  • Pose estimation
  • Text identification

Dr. Athanasios Nikolaidis
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Image restoration and enhancement
  • Image compression
  • Edge detection
  • Image segmentation
  • Semantic segmentation
  • Image classification
  • Image inpainting
  • Image captioning
  • Feature detection and extraction
  • Object recognition
  • Content-based image retrieval
  • Optical character recognition
  • Face recognition
  • Emotion recognition
  • Gesture recognition
  • Egomotion
  • Object tracking
  • Trajectory prediction
  • 3D scene reconstruction
  • Pose estimation
  • Text identification

Published Papers (16 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

19 pages, 17208 KiB  
Article
Image Analysis and Processing for Generating Camouflages from Digital Earth Photographs
by Aneta Poniszewska-Marańda, Michał Suszek and Krzysztof Stepień
Appl. Sci. 2023, 13(1), 403; https://doi.org/10.3390/app13010403 - 28 Dec 2022
Viewed by 1392
Abstract
Camouflage is present both on the civilian market and in military. It is used by hunters and military enthusiasts, but also by clothing designers and game developers. They are forced to design their own or choose between already developed camouflages, both military and [...] Read more.
Camouflage is present both on the civilian market and in military. It is used by hunters and military enthusiasts, but also by clothing designers and game developers. They are forced to design their own or choose between already developed camouflages, both military and civilian, so they cannot adapt it to their needs in a fast and easy way. Currently, there are not many software solutions that allow for easy generation of digital camouflages and support the user in selecting the colors for final camouflage result. The approach presented in the paper proposes to solve these problems by analyzing the colors from digital Earth images of the target areas and using the developed image processing algorithm for generating digital camouflages. Based on the proposed approach and its designed algorithm, the application was created to allow the user to easily generate the digital camouflages. The paper also presents the results of analysis of camouflage quality, comparing the camouflages generated with the developed algorithms and their application and the selected market generators together with the selected military digital camouflages. By using the proposed algorithm to generate the camouflage and implementing the centroid algorithm for color extraction, it was possible to create better quality camouflages compared to those created by existing software solutions. This was supported by an analysis of the camouflage quality in chosen terrain variants, in which developed application achieved the best results. Full article
Show Figures

Figure 1

14 pages, 4627 KiB  
Article
Dense Semantic Forecasting with Multi-Level Feature Warping
by Iva Sović, Josip Šarić and Siniša Šegvić
Appl. Sci. 2023, 13(1), 400; https://doi.org/10.3390/app13010400 - 28 Dec 2022
Viewed by 1583
Abstract
Anticipation of per-pixel semantics in a future unobserved frame is also known as dense semantic forecasting. State-of-the-art methods are based on single-level regression of a subsampled abstract representation of a recognition model. However, single-level regression cannot account for skip connections from the backbone [...] Read more.
Anticipation of per-pixel semantics in a future unobserved frame is also known as dense semantic forecasting. State-of-the-art methods are based on single-level regression of a subsampled abstract representation of a recognition model. However, single-level regression cannot account for skip connections from the backbone to the upsampling path. We propose to address this shortcoming by warping shallow features from observed images with upsampled feature flow. Our goal is not straightforward, since warping with coarse feature flow introduces noise into the forecasted features. We therefore base our work on single-frame models that are more resistant to the noise in skip connections. To achieve this, we propose a training procedure that enables recognition models to operate reasonably well with or without skip connections. Validation experiments reveal interesting insights into the influence of particular skip connections on recognition accuracy. Our forecasting method delivers 70.2% mIoU 0.18 s into the future and 58.5% mIoU 0.54 s into the future. These experiments show 0.6 mIoU points of improved accuracy with respect to the baseline and reveal promising directions for future work. Full article
Show Figures

Figure 1

34 pages, 15985 KiB  
Article
Automated Detection and Classification of Returnable Packaging Based on YOLOV4 Algorithm
by Matko Glučina, Sandi Baressi Šegota, Nikola Anđelić and Zlatan Car
Appl. Sci. 2022, 12(21), 11131; https://doi.org/10.3390/app122111131 - 2 Nov 2022
Cited by 5 | Viewed by 1763
Abstract
This article describes the implementation of the You Only Look Once (YOLO) detection algorithm for the detection of returnable packaging. The method of creating an original dataset and creating an augmented dataset is shown. The model was evaluated using mean Average Precision (mAP), [...] Read more.
This article describes the implementation of the You Only Look Once (YOLO) detection algorithm for the detection of returnable packaging. The method of creating an original dataset and creating an augmented dataset is shown. The model was evaluated using mean Average Precision (mAP), F1score, Precision, Recall, Average Intersection over Union (Average IoU) score, and Average Loss. The training was conducted in four cycles, i.e., 6000, 8000, 10,000, and 20,000 max batches with three different activation functions Mish, ReLU, and Linear (used in 6000 and 8000 max batches). The influence train/test dataset ratio was also investigated. The conducted investigation showed that variation of hyperparameters (activation function and max batch sizes) have a significant influence on detection and classification accuracy with the best results obtained in the case of YOLO version 4 (YOLOV4) with the Mish activation function and max batch size of 20,000 that achieved the highest mAP of 99.96% and lowest average error of 0.3643. Full article
Show Figures

Figure 1

17 pages, 8921 KiB  
Article
Movement-in-a-Video Detection Scheme for Sign Language Gesture Recognition Using Neural Network
by Angela C. Caliwag, Han-Jeong Hwang, Sang-Ho Kim and Wansu Lim
Appl. Sci. 2022, 12(20), 10542; https://doi.org/10.3390/app122010542 - 19 Oct 2022
Cited by 1 | Viewed by 2480
Abstract
Sign language aids in overcoming the communication barrier between hearing-impaired individuals and those with normal hearing. However, not all individuals with normal hearing are skilled at using sign language. Consequently, deaf and hearing-impaired individuals generally encounter the problem of limited communication while interacting [...] Read more.
Sign language aids in overcoming the communication barrier between hearing-impaired individuals and those with normal hearing. However, not all individuals with normal hearing are skilled at using sign language. Consequently, deaf and hearing-impaired individuals generally encounter the problem of limited communication while interacting with individuals with normal hearing. In this study, a sign language recognition method based on a movement-in-a-video detection scheme is proposed. The proposed scheme is applied to extract unique spatial and temporal features from each gesture. The extracted features are subsequently used to train a neural network to classify the gestures. The proposed movement-in-a-video detection scheme is applied to sign language videos featuring short, medium, and long gestures. The proposed method achieved an accuracy of 90.33% and 40% in classifying short and medium gestures, respectively, compared with 69% and 43.7% achieved using other methods. In addition, the improved accuracies were achieved with less computational complexity and cost. It is anticipated that improvements in the proposed method, for it to achieve high accuracy for long gestures, can enable hearing-impaired individuals to communicate with normal-hearing people who do not have knowledge of sign language. Full article
Show Figures

Figure 1

15 pages, 10852 KiB  
Article
Dataset Transformation System for Sign Language Recognition Based on Image Classification Network
by Sang-Geun Choi, Yeonji Park and Chae-Bong Sohn
Appl. Sci. 2022, 12(19), 10075; https://doi.org/10.3390/app121910075 - 7 Oct 2022
Cited by 2 | Viewed by 1854
Abstract
Among the various fields where deep learning is used, there are challenges to be solved in motion recognition. One is that it is difficult to manage because of the vast amount of data. Another is that it takes a long time to learn [...] Read more.
Among the various fields where deep learning is used, there are challenges to be solved in motion recognition. One is that it is difficult to manage because of the vast amount of data. Another is that it takes a long time to learn due to the complex network and the large amount of data. To solve the problems, we propose a dataset transformation system. Sign language recognition was implemented to evaluate the performance of this system. The system consists of three steps: pose estimation, normalization, and spatial–temporal map (STmap) generation. STmap is a method of simultaneously expressing temporal data and spatial data in one image. In addition, the accuracy of the model was improved, and the error sensitivity was lowered through the data augmentation process. Through the proposed method, it was possible to reduce the dataset from 94.39 GB to 954 MB. It corresponds to approximately 1% of the original. When the dataset created through the proposed method is trained on the image classification model, the sign language recognition accuracy is 84.5%. Full article
Show Figures

Figure 1

13 pages, 2772 KiB  
Article
Multifilters-Based Unsupervised Method for Retinal Blood Vessel Segmentation
by Nayab Muzammil, Syed Ayaz Ali Shah, Aamir Shahzad, Muhammad Amir Khan and Rania M. Ghoniem
Appl. Sci. 2022, 12(13), 6393; https://doi.org/10.3390/app12136393 - 23 Jun 2022
Cited by 14 | Viewed by 2064
Abstract
Fundus imaging is one of the crucial methods that help ophthalmologists for diagnosing the various eye diseases in modern medicine. An accurate vessel segmentation method can be a convenient tool to foresee and analyze fatal diseases, including hypertension or diabetes, which damage the [...] Read more.
Fundus imaging is one of the crucial methods that help ophthalmologists for diagnosing the various eye diseases in modern medicine. An accurate vessel segmentation method can be a convenient tool to foresee and analyze fatal diseases, including hypertension or diabetes, which damage the retinal vessel’s appearance. This work suggests an unsupervised approach for vessels segmentation out of retinal images. The proposed method includes multiple steps. Firstly, from the colored retinal image, green channel is extracted and preprocessed utilizing Contrast Limited Histogram Equalization as well as Fuzzy Histogram Based Equalization for contrast enhancement. To expel geometrical articles (macula, optic disk) and noise, top-hat morphological operations are used. On the resulted enhanced image, matched filter and Gabor wavelet filter are applied, and the outputs from both is added to extract vessels pixels. The resulting image with the now noticeable blood vessel is binarized using human visual system (HVS). A final image of segmented blood vessel is obtained by applying post-processing. The suggested method is assessed on two public datasets (DRIVE and STARE) and showed comparable results with regard to sensitivity, specificity and accuracy. The results we achieved with respect to sensitivity, specificity together with accuracy on DRIVE database are 0.7271, 0.9798 and 0.9573, and on STARE database these are 0.7164, 0.9760, and 0.9560, respectively, in less than 3.17 s on average per image. Full article
Show Figures

Figure 1

17 pages, 5897 KiB  
Article
KFSENet: A Key Frame-Based Skeleton Feature Estimation and Action Recognition Network for Improved Robot Vision with Face and Emotion Recognition
by Dinh-Son Le, Hai-Hong Phan, Ha Huy Hung, Van-An Tran, The-Hung Nguyen and Dinh-Quan Nguyen
Appl. Sci. 2022, 12(11), 5455; https://doi.org/10.3390/app12115455 - 27 May 2022
Cited by 6 | Viewed by 2106
Abstract
In this paper, we propose an integrated approach to robot vision: a key frame-based skeleton feature estimation and action recognition network (KFSENet) that incorporates action recognition with face and emotion recognition to enable social robots to engage in more personal interactions. Instead of [...] Read more.
In this paper, we propose an integrated approach to robot vision: a key frame-based skeleton feature estimation and action recognition network (KFSENet) that incorporates action recognition with face and emotion recognition to enable social robots to engage in more personal interactions. Instead of extracting the human skeleton features from the entire video, we propose a key frame-based approach for their extraction using pose estimation models. We select the key frames using the gradient of a proposed total motion metric that is computed using dense optical flow. We use the extracted human skeleton features from the selected key frames to train a deep neural network (i.e., the double-feature double-motion network (DDNet)) for action recognition. The proposed KFSENet utilizes a simpler model to learn and differentiate between the different action classes, is computationally simpler and yields better action recognition performance when compared with existing methods. The use of key frames allows the proposed method to eliminate unnecessary and redundant information, which improves its classification accuracy and decreases its computational cost. The proposed method is tested on both publicly available standard benchmark datasets and self-collected datasets. The performance of the proposed method is compared to existing state-of-the-art methods. Our results indicate that the proposed method yields better performance compared with existing methods. Moreover, our proposed framework integrates face and emotion recognition to enable social robots to engage in more personal interaction with humans. Full article
Show Figures

Figure 1

19 pages, 2010 KiB  
Article
Chinese Traffic Police Gesture Recognition Based on Graph Convolutional Network in Natural Scene
by Kang Liu, Ying Zheng, Junyi Yang, Hong Bao and Haoming Zeng
Appl. Sci. 2021, 11(24), 11951; https://doi.org/10.3390/app112411951 - 15 Dec 2021
Cited by 2 | Viewed by 3626
Abstract
For an automated driving system to be robust, it needs to recognize not only fixed signals such as traffic signs and traffic lights, but also gestures used by traffic police. With the aim to achieve this requirement, this paper proposes a new gesture [...] Read more.
For an automated driving system to be robust, it needs to recognize not only fixed signals such as traffic signs and traffic lights, but also gestures used by traffic police. With the aim to achieve this requirement, this paper proposes a new gesture recognition technology based on a graph convolutional network (GCN) according to an analysis of the characteristics of gestures used by Chinese traffic police. To begin, we used a spatial–temporal graph convolutional network (ST-GCN) as a base network while introducing the attention mechanism, which enhanced the effective features of gestures used by traffic police and balanced the information distribution of skeleton joints in the spatial dimension. Next, to solve the problem of the former graph structure only representing the physical structure of the human body, which cannot capture the potential effective features, this paper proposes an adaptive graph structure (AGS) model to explore the hidden feature between traffic police gesture nodes and a temporal attention mechanism (TAS) to extract features in the temporal dimension. In this paper, we established a traffic police gesture dataset, which contained 20,480 videos in total, and an ablation study was carried out to verify the effectiveness of the method we proposed. The experiment results show that the proposed method improves the accuracy of traffic police gesture recognition to a certain degree; the top-1 is 87.72%, and the top-3 is 95.26%. In addition, to validate the method’s generalization ability, we also carried out an experiment on the Kinetics–Skeleton dataset in this paper; the results show that the proposed method is better than some of the existing action-recognition algorithms. Full article
Show Figures

Figure 1

11 pages, 1351 KiB  
Article
Extraction of Pest Insect Characteristics Present in a Mirasol Pepper (Capsicum annuum L.) Crop by Digital Image Processing
by Mireya Moreno-Lucio, Celina Lizeth Castañeda-Miranda, Gustavo Espinoza-García, Carlos Alberto Olvera-Olvera, Luis F. Luque-Vega, Antonio Del Rio-De Santiago, Héctor A. Guerrero-Osuna, Ma. del Rosario Martínez-Blanco and Luis Octavio Solís-Sánchez
Appl. Sci. 2021, 11(23), 11166; https://doi.org/10.3390/app112311166 - 25 Nov 2021
Cited by 1 | Viewed by 2015
Abstract
One of the main problems in crops is the presence of pests. Traditionally, sticky yellow traps are used to detect pest insects, and they are then analyzed by a specialist to identify the pest insects present in the crop. To facilitate the identification, [...] Read more.
One of the main problems in crops is the presence of pests. Traditionally, sticky yellow traps are used to detect pest insects, and they are then analyzed by a specialist to identify the pest insects present in the crop. To facilitate the identification, classification, and counting of these insects, it is possible to use digital image processing (DIP). This study aims to demonstrate that DIP is useful for extracting invariant characteristics of psyllids (Bactericera cockerelli), thrips (Thrips tabaci), whiteflies (Bemisia tabaci), potato flea beetles (Epitrix cucumeris), pepper weevils (Anthonomus eugenii), and aphids (Myzus persicae). The characteristics (e.g., area, eccentricity, and solidity) help classify insects. DIP includes a first stage that consists of improving the image by changing the levels of color intensity, applying morphological filters, and detecting objects of interest, and a second stage that consists of applying a transformation of invariant scales to extract characteristics of insects, independently of size or orientation. The results were compared with the data obtained from an entomologist, reaching up to 90% precision for the classification of these insects. Full article
Show Figures

Figure 1

33 pages, 21561 KiB  
Article
Dilated Filters for Edge-Detection Algorithms
by Ciprian Orhei, Victor Bogdan, Cosmin Bonchis and Radu Vasiu
Appl. Sci. 2021, 11(22), 10716; https://doi.org/10.3390/app112210716 - 13 Nov 2021
Cited by 12 | Viewed by 2566
Abstract
Edges are a basic and fundamental feature in image processing that is used directly or indirectly in huge number of applications. Inspired by the expansion of image resolution and processing power, dilated-convolution techniques appeared. Dilated convolutions have impressive results in machine learning, so [...] Read more.
Edges are a basic and fundamental feature in image processing that is used directly or indirectly in huge number of applications. Inspired by the expansion of image resolution and processing power, dilated-convolution techniques appeared. Dilated convolutions have impressive results in machine learning, so naturally we discuss the idea of dilating the standard filters from several edge-detection algorithms. In this work, we investigated the research hypothesis that use dilated filters, rather than the extended or classical ones, and obtained better edge map results. To demonstrate this hypothesis, we compared the results of the edge-detection algorithms using the proposed dilation filters with original filters or custom variants. Experimental results confirm our statement that the dilation of filters have a positive impact for edge-detection algorithms from simple to rather complex algorithms. Full article
Show Figures

Figure 1

19 pages, 6030 KiB  
Article
Towards Single 2D Image-Level Self-Supervision for 3D Human Pose and Shape Estimation
by Junuk Cha, Muhammad Saqlain, Changhwa Lee, Seongyeong Lee, Seungeun Lee, Donguk Kim, Won-Hee Park and Seungryul Baek
Appl. Sci. 2021, 11(20), 9724; https://doi.org/10.3390/app11209724 - 18 Oct 2021
Cited by 6 | Viewed by 2191
Abstract
Three-dimensional human pose and shape estimation is an important problem in the computer vision community, with numerous applications such as augmented reality, virtual reality, human computer interaction, and so on. However, training accurate 3D human pose and shape estimators based on deep learning [...] Read more.
Three-dimensional human pose and shape estimation is an important problem in the computer vision community, with numerous applications such as augmented reality, virtual reality, human computer interaction, and so on. However, training accurate 3D human pose and shape estimators based on deep learning approaches requires a large number of images and corresponding 3D ground-truth pose pairs, which are costly to collect. To relieve this constraint, various types of weakly or self-supervised pose estimation approaches have been proposed. Nevertheless, these methods still involve supervision signals, which require effort to collect, such as unpaired large-scale 3D ground truth data, a small subset of 3D labeled data, video priors, and so on. Often, they require installing equipment such as a calibrated multi-camera system to acquire strong multi-view priors. In this paper, we propose a self-supervised learning framework for 3D human pose and shape estimation that does not require other forms of supervision signals while using only single 2D images. Our framework inputs single 2D images, estimates human 3D meshes in the intermediate layers, and is trained to solve four types of self-supervision tasks (i.e., three image manipulation tasks and one neural rendering task) whose ground-truths are all based on the single 2D images themselves. Through experiments, we demonstrate the effectiveness of our approach on 3D human pose benchmark datasets (i.e., Human3.6M, 3DPW, and LSP), where we present the new state-of-the-art among weakly/self-supervised methods. Full article
Show Figures

Figure 1

16 pages, 4562 KiB  
Article
An Efficient Text Detection Model for Street Signs
by Manhuai Lu, Yuanxiang Mou, Chin-Ling Chen and Qiting Tang
Appl. Sci. 2021, 11(13), 5962; https://doi.org/10.3390/app11135962 - 26 Jun 2021
Cited by 7 | Viewed by 2025
Abstract
Text detection in natural scenes is a current research hotspot. The Efficient and Accurate Scene Text (EAST) detector model has fast detection speed and good performance but is ineffective in detecting long text regions owing to its small receptive field. In this study, [...] Read more.
Text detection in natural scenes is a current research hotspot. The Efficient and Accurate Scene Text (EAST) detector model has fast detection speed and good performance but is ineffective in detecting long text regions owing to its small receptive field. In this study, we built upon the EAST model by improving the bounding box’s shrinking algorithm to make the model more accurate in predicting short edges of text regions; altering the loss function from balanced cross-entropy to Focal loss; improving the model’s learning ability on hard, positive examples; and adding a feature enhancement module (FEM) to increase the receptive field of the EAST model and enhance its detection ability for long text regions. The improved EAST model achieved better detection results on both the ICDAR2015 dataset and the Street Sign Text Detection (SSTD) dataset proposed in this paper. The precision and F1 scores of the model also demonstrated advantages over other models on the ICDAR2015 dataset. A comparison of the text detection effects between the improved EAST model and the EAST model showed that the proposed FEM was more effective in increasing the EAST detector’s receptive field, which indicates that it can improve the detection of long text regions. Full article
Show Figures

Figure 1

16 pages, 6417 KiB  
Article
An Automatic 3D Point Cloud Registration Method Based on Biological Vision
by Jinbo Liu, Pengyu Guo and Xiaoliang Sun
Appl. Sci. 2021, 11(10), 4538; https://doi.org/10.3390/app11104538 - 16 May 2021
Cited by 6 | Viewed by 1974
Abstract
When measuring surface deformation, because the overlap of point clouds before and after deformation is small and the accuracy of the initial value of point cloud registration cannot be guaranteed, traditional point cloud registration methods cannot be applied. In order to solve this [...] Read more.
When measuring surface deformation, because the overlap of point clouds before and after deformation is small and the accuracy of the initial value of point cloud registration cannot be guaranteed, traditional point cloud registration methods cannot be applied. In order to solve this problem, a complete solution is proposed, first, by fixing at least three cones to the target. Then, through cone vertices, initial values of the transformation matrix can be calculated. On the basis of this, the point cloud registration can be performed accurately through the iterative closest point (ICP) algorithm using the neighboring point clouds of cone vertices. To improve the automation of this solution, an accurate and automatic point cloud registration method based on biological vision is proposed. First, the three-dimensional (3D) coordinates of cone vertices are obtained through multi-view observation, feature detection, data fusion, and shape fitting. In shape fitting, a closed-form solution of cone vertices is derived on the basis of the quadratic form. Second, a random strategy is designed to calculate the initial values of the transformation matrix between two point clouds. Then, combined with ICP, point cloud registration is realized automatically and precisely. The simulation results showed that, when the intensity of Gaussian noise ranged from 0 to 1 mr (where mr denotes the average mesh resolution of the models), the rotation and translation errors of point cloud registration were less than 0.1° and 1 mr, respectively. Lastly, a camera-projector system to dynamically measure the surface deformation during ablation tests in an arc-heated wind tunnel was developed, and the experimental results showed that the measuring precision for surface deformation exceeded 0.05 mm when surface deformation was smaller than 4 mm. Full article
Show Figures

Figure 1

15 pages, 2627 KiB  
Article
Real-Time Hand Gesture Recognition Based on Deep Learning YOLOv3 Model
by Abdullah Mujahid, Mazhar Javed Awan, Awais Yasin, Mazin Abed Mohammed, Robertas Damaševičius, Rytis Maskeliūnas and Karrar Hameed Abdulkareem
Appl. Sci. 2021, 11(9), 4164; https://doi.org/10.3390/app11094164 - 2 May 2021
Cited by 171 | Viewed by 20079
Abstract
Using gestures can help people with certain disabilities in communicating with other people. This paper proposes a lightweight model based on YOLO (You Only Look Once) v3 and DarkNet-53 convolutional neural networks for gesture recognition without additional preprocessing, image filtering, and enhancement of [...] Read more.
Using gestures can help people with certain disabilities in communicating with other people. This paper proposes a lightweight model based on YOLO (You Only Look Once) v3 and DarkNet-53 convolutional neural networks for gesture recognition without additional preprocessing, image filtering, and enhancement of images. The proposed model achieved high accuracy even in a complex environment, and it successfully detected gestures even in low-resolution picture mode. The proposed model was evaluated on a labeled dataset of hand gestures in both Pascal VOC and YOLO format. We achieved better results by extracting features from the hand and recognized hand gestures of our proposed YOLOv3 based model with accuracy, precision, recall, and an F-1 score of 97.68, 94.88, 98.66, and 96.70%, respectively. Further, we compared our model with Single Shot Detector (SSD) and Visual Geometry Group (VGG16), which achieved an accuracy between 82 and 85%. The trained model can be used for real-time detection, both for static hand images and dynamic gestures recorded on a video. Full article
Show Figures

Figure 1

23 pages, 5707 KiB  
Article
Methods for Improving Image Quality for Contour and Textures Analysis Using New Wavelet Methods
by Catalin Dumitrescu, Maria Simona Raboaca and Raluca Andreea Felseghi
Appl. Sci. 2021, 11(9), 3895; https://doi.org/10.3390/app11093895 - 25 Apr 2021
Cited by 3 | Viewed by 2420
Abstract
The fidelity of an image subjected to digital processing, such as a contour/texture highlighting process or a noise reduction algorithm, can be evaluated based on two types of criteria: objective and subjective, sometimes the two types of criteria being considered together. Subjective criteria [...] Read more.
The fidelity of an image subjected to digital processing, such as a contour/texture highlighting process or a noise reduction algorithm, can be evaluated based on two types of criteria: objective and subjective, sometimes the two types of criteria being considered together. Subjective criteria are the best tool for evaluating an image when the image obtained at the end of the processing is interpreted by man. The objective criteria are based on the difference, pixel by pixel, between the original and the reconstructed image and ensure a good approximation of the image quality perceived by a human observer. There is also the possibility that in evaluating the fidelity of a remade (reconstructed) image, the pixel-by-pixel differences will be weighted according to the sensitivity of the human visual system. The problem of improving medical images is particularly important in assisted diagnosis, with the aim of providing physicians with information as useful as possible in diagnosing diseases. Given that this information must be available in real time, we proposed a solution for reconstructing the contours in the images that uses a modified Wiener filter in the wavelet domain and a nonlinear cellular network and that is useful both to improve the contrast of its contours and to eliminate noise. In addition to the need to improve imaging, medical applications also need these applications to run in real time, and this need has been the basis for the design of the method described below, based on the modified Wiener filter and nonlinear cellular networks. Full article
Show Figures

Figure 1

Review

Jump to: Research

10 pages, 338 KiB  
Review
A Survey on Depth Ambiguity of 3D Human Pose Estimation
by Siqi Zhang, Chaofang Wang, Wenlong Dong and Bin Fan
Appl. Sci. 2022, 12(20), 10591; https://doi.org/10.3390/app122010591 - 20 Oct 2022
Cited by 5 | Viewed by 2474
Abstract
Depth ambiguity is one of the main challenges of three-dimensional (3D) human pose estimation (HPE). The recent strategies of disambiguating have brought significant progress and remarkable breakthroughs in the field of 3D human pose estimation (3D HPE). This survey extensively reviews the causes [...] Read more.
Depth ambiguity is one of the main challenges of three-dimensional (3D) human pose estimation (HPE). The recent strategies of disambiguating have brought significant progress and remarkable breakthroughs in the field of 3D human pose estimation (3D HPE). This survey extensively reviews the causes and solutions of the depth ambiguity. The solutions are systematically classified into four categories: camera parameter constraints, temporal consistency constraints, kinematic constraints, and image cues constraints. This paper summarizes the performance comparison, challenges, main frameworks, and evaluation metrics, and discusses some promising future research directions. Full article
Show Figures

Figure 1

Back to TopTop