Special Issue "Computer Vision and Pattern Recognition Techniques"

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 20 July 2022 | Viewed by 11507

Special Issue Editor

Dr. Donghyeon Cho
E-Mail Website
Guest Editor
Department of Electronics Engineering, Chungnam National University, Daejeon 34134, Korea
Interests: computer vision; deep learning

Special Issue Information

Dear Colleagues,

Recently, AI, machine learning, pattern recognition, and deep learning have been attracting attention in various fields of applications, such as autonomous driving, IoT, robot, drone, smart mobility, etc. These applications acquire data from the surrounding environment through sensing; they then analyze the acquired data, making decisions and actions based on the analysis. Vision sensors are mainly used to acquire data, thus, computer vision technology that analyzes and utilizes visual information is of great importance. The aims of this Special Issue are to provide a venue to publish various research about computer vision technologies based on AI, machine learning, pattern recognition, and deep learning. Specifically, our scope includes recognition tasks (including image classification, object detection, and segmentation), low-level vision tasks (including super resolution and image denoising), and tasks related to video and 3D vision.

Dr. Donghyeon Cho
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • image processing
  • deep learning
  • recognition
  • object detection
  • segmentation
  • convolutional neural network
  • video processing
  • 3d vision
  • low-level vision

Published Papers (16 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Article
Online Learning for Reference-Based Super-Resolution
Electronics 2022, 11(7), 1064; https://doi.org/10.3390/electronics11071064 - 28 Mar 2022
Viewed by 369
Abstract
Online learning is a method for exploiting input data to update deep networks in the test stage to derive potential performance improvement. Existing online learning methods for single-image super-resolution (SISR) utilize an input low-resolution (LR) image for the online adaptation of deep networks. [...] Read more.
Online learning is a method for exploiting input data to update deep networks in the test stage to derive potential performance improvement. Existing online learning methods for single-image super-resolution (SISR) utilize an input low-resolution (LR) image for the online adaptation of deep networks. Unlike SISR approaches, reference-based super-resolution (RefSR) algorithms benefit from an additional high-resolution (HR) reference image containing plenty of useful features for enhancing the input LR image. Therefore, we introduce a new online learning algorithm, using several reference images, which is applicable to not only RefSR but also SISR networks. Experimental results show that our online learning method is seamlessly applicable to many existing RefSR and SISR models, and that improves performance. We further present the robustness of our method to non-bicubic degradation kernels with in-depth analyses. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
Mathematical Formula Image Screening Based on Feature Correlation Enhancement
Electronics 2022, 11(5), 799; https://doi.org/10.3390/electronics11050799 - 03 Mar 2022
Viewed by 487
Abstract
There are mathematical formula images or other images in scientific and technical documents or on web pages, and mathematical formula images are classified as either containing only mathematical formulas or formulas interspersed with other elements, such as text and coordinate diagrams. To screen [...] Read more.
There are mathematical formula images or other images in scientific and technical documents or on web pages, and mathematical formula images are classified as either containing only mathematical formulas or formulas interspersed with other elements, such as text and coordinate diagrams. To screen and collect images containing mathematical formulas for others to study or for further research, a model for screening images of mathematical formulas based on feature correlation enhancement is proposed. First, the Feature Correlation Enhancement (FCE) module was designed to improve the correlation degree of mathematical formula features and weaken other features. Then, the strip multi-scale pooling (SMP) module was designed to solve the problem of non-uniform image size, while enhancing the focus on horizontal formula features. Finally, the loss function was improved to balance the dataset. The accuracy of the experiment was 89.50%, which outperformed the existing model. Using the model to screen images enables the user to screen out images containing mathematical formulas. The screening of images containing mathematical formulas helps to speed up the creation of a database of mathematical formula images. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
Solpen: An Accurate 6-DOF Positioning Tool for Vision-Guided Robotics
Electronics 2022, 11(4), 618; https://doi.org/10.3390/electronics11040618 - 17 Feb 2022
Viewed by 539
Abstract
A robot trajectory teaching system with a vision-based positioning pen, which we called Solpen, is developed to generate pose paths of six degrees of freedom (6-DoF) for vision-guided robotics applications such as welding, cutting, painting, or polishing, which can achieve a millimeter dynamic [...] Read more.
A robot trajectory teaching system with a vision-based positioning pen, which we called Solpen, is developed to generate pose paths of six degrees of freedom (6-DoF) for vision-guided robotics applications such as welding, cutting, painting, or polishing, which can achieve a millimeter dynamic accuracy within a meter working distance from the camera. The system is simple and requires only a 2D camera and the printed ArUco markers which are hand-glued on 31 surfaces of the designed 3D-printed Solpen. Image processing techniques are implemented to remove noise and sharpen the edge of the ArUco images and also enhance the contrast of the ArUco edge intensity generated by the pyramid reconstruction. In addition, the least squares method is implemented to optimize parameters for the center pose of the truncated Icosahedron center, and the vector of the Solpen-tip. From dynamic experiments conducted with ChArUco board to verify exclusively the pen performance, the developed system is robust within its working range, and achieves a minimum axis-accuracy at approximately 0.8 mm. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
Dense Residual Transformer for Image Denoising
Electronics 2022, 11(3), 418; https://doi.org/10.3390/electronics11030418 - 29 Jan 2022
Cited by 1 | Viewed by 729
Abstract
Image denoising is an important low-level computer vision task, which aims to reconstruct a noise-free and high-quality image from a noisy image. With the development of deep learning, convolutional neural network (CNN) has been gradually applied and achieved great success in image denoising, [...] Read more.
Image denoising is an important low-level computer vision task, which aims to reconstruct a noise-free and high-quality image from a noisy image. With the development of deep learning, convolutional neural network (CNN) has been gradually applied and achieved great success in image denoising, image compression, image enhancement, etc. Recently, Transformer has been a hot technique, which is widely used to tackle computer vision tasks. However, few Transformer-based methods have been proposed for low-level vision tasks. In this paper, we proposed an image denoising network structure based on Transformer, which is named DenSformer. DenSformer consists of three modules, including a preprocessing module, a local-global feature extraction module, and a reconstruction module. Specifically, the local-global feature extraction module consists of several Sformer groups, each of which has several ETransformer layers and a convolution layer, together with a residual connection. These Sformer groups are densely skip-connected to fuse the feature of different layers, and they jointly capture the local and global information from the given noisy images. We conduct our model on comprehensive experiments. In synthetic noise removal, DenSformer outperforms other state-of-the-art methods by up to 0.06–0.28 dB in gray-scale images and 0.57–1.19 dB in color images. In real noise removal, DenSformer can achieve comparable performance, while the number of parameters can be reduced by up to 40%. Experimental results prove that our DenSformer achieves improvement compared to some state-of-the-art methods, both for the synthetic noise data and real noise data, in the objective and subjective evaluations. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
Multiple Cues-Based Robust Visual Object Tracking Method
Electronics 2022, 11(3), 345; https://doi.org/10.3390/electronics11030345 - 24 Jan 2022
Viewed by 597
Abstract
Visual object tracking is still considered a challenging task in computer vision research society. The object of interest undergoes significant appearance changes because of illumination variation, deformation, motion blur, background clutter, and occlusion. Kernelized correlation filter- (KCF) based tracking schemes have shown good [...] Read more.
Visual object tracking is still considered a challenging task in computer vision research society. The object of interest undergoes significant appearance changes because of illumination variation, deformation, motion blur, background clutter, and occlusion. Kernelized correlation filter- (KCF) based tracking schemes have shown good performance in recent years. The accuracy and robustness of these trackers can be further enhanced by incorporating multiple cues from the response map. Response map computation is the complementary step in KCF-based tracking schemes, and it contains a bundle of information. The majority of the tracking methods based on KCF estimate the target location by fetching a single cue-like peak correlation value from the response map. This paper proposes to mine the response map in-depth to fetch multiple cues about the target model. Furthermore, a new criterion based on the hybridization of multiple cues i.e., average peak correlation energy (APCE) and confidence of squared response map (CSRM), is presented to enhance the tracking efficiency. We update the following tracking modules based on hybridized criterion: (i) occlusion detection, (ii) adaptive learning rate adjustment, (iii) drift handling using adaptive learning rate, (iv) handling, and (v) scale estimation. We integrate all these modules to propose a new tracking scheme. The proposed tracker is evaluated on challenging videos selected from three standard datasets, i.e., OTB-50, OTB-100, and TC-128. A comparison of the proposed tracking scheme with other state-of-the-art methods is also presented in this paper. Our method improved considerably by achieving a center location error of 16.06, distance precision of 0.889, and overlap success rate of 0.824. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
Exploiting Features with Split-and-Share Module
Electronics 2022, 11(2), 235; https://doi.org/10.3390/electronics11020235 - 12 Jan 2022
Viewed by 270
Abstract
Deep convolutional neural networks (CNNs) have shown state-of-the-art performances in various computer vision tasks. Advances on CNN architectures have focused mainly on designing convolutional blocks of the feature extractors, but less on the classifiers that exploit extracted features. In this work, we propose [...] Read more.
Deep convolutional neural networks (CNNs) have shown state-of-the-art performances in various computer vision tasks. Advances on CNN architectures have focused mainly on designing convolutional blocks of the feature extractors, but less on the classifiers that exploit extracted features. In this work, we propose Split-and-Share Module (SSM), a classifier that splits a given feature into parts, which are partially shared by multiple sub-classifiers. Our intuition is that the more the features are shared, the more common they will become, and SSM can encourage such structural characteristics in the split features. SSM can be easily integrated into any architecture without bells and whistles. We have extensively validated the efficacy of SSM on ImageNet-1K classification task, and SSM has shown consistent and significant improvements over baseline architectures. In addition, we analyze the effect of SSM using the Grad-CAM visualization. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
Smart Glass System Using Deep Learning for the Blind and Visually Impaired
Electronics 2021, 10(22), 2756; https://doi.org/10.3390/electronics10222756 - 11 Nov 2021
Cited by 3 | Viewed by 1017
Abstract
Individuals suffering from visual impairments and blindness encounter difficulties in moving independently and overcoming various problems in their routine lives. As a solution, artificial intelligence and computer vision approaches facilitate blind and visually impaired (BVI) people in fulfilling their primary activities without much [...] Read more.
Individuals suffering from visual impairments and blindness encounter difficulties in moving independently and overcoming various problems in their routine lives. As a solution, artificial intelligence and computer vision approaches facilitate blind and visually impaired (BVI) people in fulfilling their primary activities without much dependency on other people. Smart glasses are a potential assistive technology for BVI people to aid in individual travel and provide social comfort and safety. However, practically, the BVI are unable move alone, particularly in dark scenes and at night. In this study we propose a smart glass system for BVI people, employing computer vision techniques and deep learning models, audio feedback, and tactile graphics to facilitate independent movement in a night-time environment. The system is divided into four models: a low-light image enhancement model, an object recognition and audio feedback model, a salient object detection model, and a text-to-speech and tactile graphics generation model. Thus, this system was developed to assist in the following manner: (1) enhancing the contrast of images under low-light conditions employing a two-branch exposure-fusion network; (2) guiding users with audio feedback using a transformer encoder–decoder object detection model that can recognize 133 categories of sound, such as people, animals, cars, etc., and (3) accessing visual information using salient object extraction, text recognition, and refreshable tactile display. We evaluated the performance of the system and achieved competitive performance on the challenging Low-Light and ExDark datasets. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
Multi-Task Learning with Task-Specific Feature Filtering in Low-Data Condition
Electronics 2021, 10(21), 2691; https://doi.org/10.3390/electronics10212691 - 04 Nov 2021
Cited by 1 | Viewed by 454
Abstract
Multi-task learning is a computationally efficient method to solve multiple tasks in one multi-task model, instead of multiple single-task models. MTL is expected to learn both diverse and shareable visual features from multiple datasets. However, MTL performances usually do not outperform single-task learning. [...] Read more.
Multi-task learning is a computationally efficient method to solve multiple tasks in one multi-task model, instead of multiple single-task models. MTL is expected to learn both diverse and shareable visual features from multiple datasets. However, MTL performances usually do not outperform single-task learning. Recent MTL methods tend to use heavy task-specific heads with large overheads to generate task-specific features. In this work, we (1) validate the efficacy of MTL in low-data conditions with early-exit architectures, and (2) propose a simple feature filtering module with minimal overheads to generate task-specific features. We assume that, in low-data conditions, the model cannot learn useful low-level features due to the limited amount of data. We empirically show that MTL can significantly improve performances in all tasks under low-data conditions. We further optimize the early-exit architecture by a sweep search on the optimal feature for each task. Furthermore, we propose a feature filtering module that selects features for each task. Using the optimized early-exit architecture with the feature filtering module, we improve the 15.937% in ImageNet and 4.847% in Places365 under the low-data condition where only 5% of the original datasets are available. Our method is empirically validated in various backbones and various MTL settings. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
GicoFace: A Deep Face Recognition Model Based on Global-Information Loss Function
Electronics 2021, 10(19), 2387; https://doi.org/10.3390/electronics10192387 - 29 Sep 2021
Viewed by 612
Abstract
As CNNs have a strong capacity to learn discriminative facial features, CNNs have greatly promoted the development of face recognition, where the loss function plays a key role in this process. Nonetheless, most of the existing loss functions do not simultaneously apply weight [...] Read more.
As CNNs have a strong capacity to learn discriminative facial features, CNNs have greatly promoted the development of face recognition, where the loss function plays a key role in this process. Nonetheless, most of the existing loss functions do not simultaneously apply weight normalization, apply feature normalization and follow the two goals of enhancing the discriminative capacity (optimizing intra-class/inter-class variance). In addition, they are updated by only considering the feedback information of each mini-batch, but ignore the information from the entire training set. This paper presents a new loss function called Gico loss. The deep model trained with Gico loss in this paper is then called GicoFace. Gico loss satisfies the four aforementioned key points, and is calculated with the global information extracted from the entire training set. The experiments are carried out on five benchmark datasets including LFW, SLLFW, YTF, MegaFace and FaceScrub. Experimental results confirm the efficacy of the proposed method and show the state-of-the-art performance of the method. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
Aircraft Type Recognition in Remote Sensing Images: Bilinear Discriminative Extreme Learning Machine Framework
Electronics 2021, 10(17), 2046; https://doi.org/10.3390/electronics10172046 - 24 Aug 2021
Viewed by 611
Abstract
Small inter-class and massive intra-class changes are important challenges in aircraft model recognition in the field of remote sensing. Although the aircraft model recognition algorithm based on the convolutional neural network (CNN) has excellent recognition performance, it is limited by sample sets and [...] Read more.
Small inter-class and massive intra-class changes are important challenges in aircraft model recognition in the field of remote sensing. Although the aircraft model recognition algorithm based on the convolutional neural network (CNN) has excellent recognition performance, it is limited by sample sets and computing resources. To solve the above problems, we propose the bilinear discriminative extreme learning machine (ELM) network (BD-ELMNet), which integrates the advantages of the CNN, autoencoder (AE), and ELM. Specifically, the BD-ELMNet first executes the convolution and pooling operations to form a convolutional ELM (ELMConvNet) to extract shallow features. Furthermore, the manifold regularized ELM-AE (MRELM-AE), which can simultaneously consider the geometrical structure and discriminative information of aircraft data, is developed to extract discriminative features. The bilinear pooling model uses the feature association information for feature fusion to enhance the substantial distinction of features. Compared with the backpropagation (BP) optimization method, BD-ELMNet adopts a layer-by-layer training method without repeated adjustments to effectively learn discriminant features. Experiments involving the application of several methods, including the proposed method, to the MTARSI benchmark demonstrate that the proposed aircraft type recognition method outperforms the state-of-the-art methods. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Graphical abstract

Article
Deep Multi-Image Steganography with Private Keys
Electronics 2021, 10(16), 1906; https://doi.org/10.3390/electronics10161906 - 09 Aug 2021
Cited by 2 | Viewed by 912
Abstract
In this paper, we propose deep multi-image steganography with private keys. Recently, several deep CNN-based algorithms have been proposed to hide multiple secret images in a single cover image. However, conventional methods are prone to the leakage of secret information because they do [...] Read more.
In this paper, we propose deep multi-image steganography with private keys. Recently, several deep CNN-based algorithms have been proposed to hide multiple secret images in a single cover image. However, conventional methods are prone to the leakage of secret information because they do not provide access to an individual secret image and often decrypt the entire hidden information all at once. To tackle the problem, we introduce the concept of private keys for secret images. Our method conceals multiple secret images in a single cover image and generates a visually similar container image containing encrypted secret information inside. In addition, private keys corresponding to each secret image are generated simultaneously. Each private key provides access to only a single secret image while keeping the other hidden images and private keys unrevealed. In specific, our model consists of deep hiding and revealing networks. The hiding network takes a cover image and secret images as inputs and extracts high-level features of the cover image and generates private keys. After that, the extracted features and private keys are concatenated and used to generate a container image. On the other hand, the revealing network extracts high-level features of the container image and decrypts a secret image using the extracted feature and a corresponding private key. Experimental results demonstrate that the proposed algorithm effectively hides and reveals multiple secret images while achieving high security. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
An Efficient Convolutional Neural Network Model Combined with Attention Mechanism for Inverse Halftoning
Electronics 2021, 10(13), 1574; https://doi.org/10.3390/electronics10131574 - 30 Jun 2021
Cited by 2 | Viewed by 586
Abstract
Inverse halftoning acting as a special image restoration problem is an ill-posed problem. Although it has been studied in the last several decades, the existing solutions can’t restore fine details and texture accurately from halftone images. Recently, the attention mechanism has shown its [...] Read more.
Inverse halftoning acting as a special image restoration problem is an ill-posed problem. Although it has been studied in the last several decades, the existing solutions can’t restore fine details and texture accurately from halftone images. Recently, the attention mechanism has shown its powerful effects in many fields, such as image processing, pattern recognition and computer vision. However, it has not yet been used in inverse halftoning. To better solve the problem of detail restoration of inverse halftoning, this paper proposes a simple yet effective deep learning model combined with the attention mechanism, which can better guide the network to remove noise dot-patterns and restore image details, and improve the network adaptation ability. The whole model is designed in an end-to-end manner, including feature extraction stage and reconstruction stage. In the feature extraction stage, halftone image features are extracted and halftone noises are removed. The reconstruction stage is employed to restore continuous-tone images by fusing the feature information extracted in the first stage and the output of the residual channel attention block. In this stage, the attention block is firstly introduced to the field of inverse halftoning, which can make the network focus on informative features and further enhance the discriminative ability of the network. In addition, a multi-stage loss function is proposed to accelerate the network optimization, which is conducive to better reconstruction of the global image. To demonstrate the generalization performance of the network for different types of halftone images, the experiment results confirm that the network can restore six different types of halftone image well. Furthermore, experimental results show that our method outperforms the state-of-the-art methods, especially in the restoration of details and textures. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
Data-Driven Channel Pruning towards Local Binary Convolution Inverse Bottleneck Network Based on Squeeze-and-Excitation Optimization Weights
Electronics 2021, 10(11), 1329; https://doi.org/10.3390/electronics10111329 - 01 Jun 2021
Viewed by 871
Abstract
This paper proposed a model pruning method based on local binary convolution (LBC) and squeeze-and-excitation (SE) optimization weights. We first proposed an efficient deep separation convolution model based on the LBC kernel. By expanding the number of LBC kernels in the model, we [...] Read more.
This paper proposed a model pruning method based on local binary convolution (LBC) and squeeze-and-excitation (SE) optimization weights. We first proposed an efficient deep separation convolution model based on the LBC kernel. By expanding the number of LBC kernels in the model, we have trained a larger model with better results, but more parameters and slower calculation speed. Then, we extract the SE optimization weight value of each SE module according to the data samples and score the LBC kernel accordingly. Based on the score of each LBC kernel corresponding to the convolution channel, we performed channel-based model pruning, which greatly reduced the number of model parameters and accelerated the calculation speed. The model pruning method proposed in this paper is verified in the image classification database. Experiments show that, in the model using the LBC kernel, as the number of LBC kernels increases, the recognition accuracy will increase. At the same time, the experiment also proved that the recognition accuracy is maintained at a similar level in the small parameter model after channel-based model pruning by the SE optimization weight value. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
A Novel Decomposition Model for Visual Rail Surface Inspection
Electronics 2021, 10(11), 1271; https://doi.org/10.3390/electronics10111271 - 26 May 2021
Viewed by 841
Abstract
Rail surface inspection plays a pivotal role in large-scale railway construction and development. However, accurately identifying possible defects involving a large variety of visual appearances and their dynamic illuminations remains challenging. In this paper, we fully explore and use the essential attributes of [...] Read more.
Rail surface inspection plays a pivotal role in large-scale railway construction and development. However, accurately identifying possible defects involving a large variety of visual appearances and their dynamic illuminations remains challenging. In this paper, we fully explore and use the essential attributes of our defect structure data and the inherent temporal and spatial characteristics of the track to establish a general theoretical framework for practical applications. As such, our framework can overcome the bottleneck associated with machine vision inspection technology in complex rail environments. In particular, we consider a differential regular term for background rather than a traditional low-rank constraint to ensure that the model can tolerate dynamic background changes without losing sensitivity when detecting defects. To better capture the compactness and completeness of a defect, we introduce a tree-shaped hierarchical structure of sparse induction norms to encode the spatial structure of the defect area. The proposed model is evaluated with respect to two newly released Type-I/II rail surfaces discrete defects (RSDD) data sets and a practical rail line. Qualitative and quantitative evaluations show that the decomposition model can handle the dynamics of the track surface well and that the model can be used for structural detection of the defect area. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
Viewpoint-Aware Action Recognition Using Skeleton-Based Features from Still Images
Electronics 2021, 10(9), 1118; https://doi.org/10.3390/electronics10091118 - 09 May 2021
Cited by 1 | Viewed by 877
Abstract
In this paper, we propose a viewpoint-aware action recognition method using skeleton-based features from static images. Our method consists of three main steps. First, we categorize the viewpoint from an input static image. Second, we extract 2D/3D joints using state-of-the-art convolutional neural networks [...] Read more.
In this paper, we propose a viewpoint-aware action recognition method using skeleton-based features from static images. Our method consists of three main steps. First, we categorize the viewpoint from an input static image. Second, we extract 2D/3D joints using state-of-the-art convolutional neural networks and analyze the geometric relationships of the joints for computing 2D and 3D skeleton features. Finally, we perform view-specific action classification per person, based on viewpoint categorization and the extracted 2D and 3D skeleton features. We implement two multi-view data acquisition systems and create a new action recognition dataset containing the viewpoint labels, in order to train and validate our method. The robustness of the proposed method to viewpoint changes was quantitatively confirmed using two multi-view datasets. A real-world application for recognizing various actions was also qualitatively demonstrated. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Article
A Saliency-Based Patch Sampling Approach for Deep Artistic Media Recognition
Electronics 2021, 10(9), 1053; https://doi.org/10.3390/electronics10091053 - 29 Apr 2021
Cited by 1 | Viewed by 470
Abstract
We present a saliency-based patch sampling strategy for recognizing artistic media from artwork images using a deep media recognition model, which is composed of several deep convolutional neural network-based recognition modules. The decisions from the individual modules are merged into the final decision [...] Read more.
We present a saliency-based patch sampling strategy for recognizing artistic media from artwork images using a deep media recognition model, which is composed of several deep convolutional neural network-based recognition modules. The decisions from the individual modules are merged into the final decision of the model. To sample a suitable patch for the input of the module, we devise a strategy that samples patches with high probabilities of containing distinctive media stroke patterns for artistic media without distortion, as media stroke patterns are key for media recognition. We design this strategy by collecting human-selected ground truth patches and analyzing the distribution of the saliency values of the patches. From this analysis, we build a strategy that samples patches that have a high probability of containing media stroke patterns. We prove that our strategy shows best performance among the existing patch sampling strategies and that our strategy shows a consistent recognition and confusion pattern with the existing strategies. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

Back to TopTop