Topic Editors

Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, 02071 Albacete, Spain
Department of IT Engineering, Sookmyung Women’s University, Seoul 04310, Republic of Korea
Department of Computer Science, University of Beira Interior, 6201-001 Covilhã, Portugal

Applied Computer Vision and Pattern Recognition

Abstract submission deadline
closed (31 December 2021)
Manuscript submission deadline
closed (31 March 2022)
Viewed by
297170

Topic Information

Dear Colleagues,

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Computer vision tasks include methods for acquiring digital images (through image sensors), image processing, and image analysis to reach an understanding of digital images. In general, it deals with the extraction of high-dimensional data from the real world in order to produce numerical or symbolic information that the computer can interpret. For interpretation, computer vision is closely related to pattern recognition.

Indeed, pattern recognition is the process of recognizing patterns by using machine learning algorithms. Pattern recognition can be defined as the identification and classification of meaningful patterns of data based on the extraction and comparison of the characteristic properties or features of the data. Pattern recognition is a very important area of research and application, underpinning developments in related fields such as computer vision, image processing, text and document analysis and neural networks. It is closely related to machine learning and finds applications in rapidly emerging areas such as biometrics, bioinformatics, multimedia data analysis and, more recently, data science.

The Applied Computer Vision and Pattern Recognition topic invites papers on theoretical and applied issues including, but not limited to, the following:

  • Statistical, structural and syntactic pattern recognition;
  • Neural networks, machine learning and deep learning;
  • Computer vision, robot vision and machine vision;
  • Multimedia systems and multimedia content;
  • Bio-signal processing, speech processing, image processing and video processing;
  • Data mining, information retrieval, big data and business intelligence.

This topic will present the results of research describing recent advances in both the computer vision and the pattern recognition fields.

Prof. Dr. Antonio Fernández-Caballero
Prof. Dr. Byung-Gyu Kim
Prof. Dr. Hugo Pedro Proença
Topic Editors

Keywords

  • pattern recognition
  • neural networks, machine learning
  • deep learning, artificial intelligence
  • computer vision
  • multimedia
  • data mining
  • signal processing
  • image processing

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.5 5.3 2011 17.8 Days CHF 2400
AI
ai
3.1 7.2 2020 17.6 Days CHF 1600
Big Data and Cognitive Computing
BDCC
3.7 7.1 2017 18 Days CHF 1800
Mathematical and Computational Applications
mca
1.9 - 1996 28.8 Days CHF 1400
Machine Learning and Knowledge Extraction
make
4.0 6.3 2019 27.1 Days CHF 1800

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (87 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
17 pages, 3304 KiB  
Article
Meta-YOLO: Meta-Learning for Few-Shot Traffic Sign Detection via Decoupling Dependencies
by Xinyue Ren, Weiwei Zhang, Minghui Wu, Chuanchang Li and Xiaolan Wang
Appl. Sci. 2022, 12(11), 5543; https://doi.org/10.3390/app12115543 - 30 May 2022
Cited by 11 | Viewed by 4844
Abstract
Considering the low coverage of roadside cooperative devices at the current time, automated driving should detect all road markings relevant to driving safety, such as traffic signs that tend to be of great variety but are fewer in number. In this work, we [...] Read more.
Considering the low coverage of roadside cooperative devices at the current time, automated driving should detect all road markings relevant to driving safety, such as traffic signs that tend to be of great variety but are fewer in number. In this work, we propose an innovative few-shot object detection framework, namely Meta-YOLO, whose challenge is to generalize to the unseen classes by using only a few seen classes. Simply integrating the YOLO mechanism into a meta-learning pipeline will encounter problems in terms of computational efficiency and mistake detection. Therefore, we construct a two-stage meta-learner model that can learn the learner initialization, the learner update direction and learning rate, all in a single meta-learning process. To facilitate deep networks with learning, the fidelity features of the targets improve the performance of meta-learner , but we also design a feature decorrelation module (FDM), which firstly transforms non-linear features into computable linear features based on RFF, and secondly perceives and removes global correlations by iteratively saving and reloading the features and sample weights of the model. We introduce a three-head module to learn global, local and patch correlations with the category detection result outputted by the aggregation in meta-learner , which endows a multi-scale ability with detector ϕ. In our experiments, the proposed algorithm outperforms the three benchmark algorithms and improves the mAP of few-shot detection by 39.8%. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 17613 KiB  
Article
Dynamic Anchor: A Feature-Guided Anchor Strategy for Object Detection
by Xing Liu, Huai-Xin Chen and Bi-Yuan Liu
Appl. Sci. 2022, 12(10), 4897; https://doi.org/10.3390/app12104897 - 12 May 2022
Cited by 4 | Viewed by 2450
Abstract
The majority of modern object detectors rely on a set of pre-defined anchor boxes, which enhances detection performance dramatically. Nevertheless, the pre-defined anchor strategy suffers some drawbacks, especially the complex hyper-parameters of anchors, seriously affecting detection performance. In this paper, we propose a [...] Read more.
The majority of modern object detectors rely on a set of pre-defined anchor boxes, which enhances detection performance dramatically. Nevertheless, the pre-defined anchor strategy suffers some drawbacks, especially the complex hyper-parameters of anchors, seriously affecting detection performance. In this paper, we propose a feature-guided anchor generation method named dynamic anchor. Dynamic anchor mainly includes two structures: the anchor generator and the feature enhancement module. The anchor generator leverages semantic features to predict optimized anchor shapes at the locations where the objects are likely to exist in the feature maps; by converting the predicted shape maps into location offsets, the feature enhancement module uses the high-quality anchors to improve detection performance. Compared with the hand-designed anchor scheme, dynamic anchor discards all pre-defined boxes and avoids complex hyper-parameters. In addition, only one anchor box is predicted for each location, which dramatically reduces calculation. With ResNet-50 and ResNet-101 as the backbone of the one-stage detector RetinaNet, dynamic anchor achieved 2.1 AP and 1.0 AP gains, respectively. The proposed dynamic anchor strategy can be easily integrated into the anchor-based detectors to replace the traditional pre-defined anchor scheme. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

17 pages, 14296 KiB  
Article
Lamb Behaviors Analysis Using a Predictive CNN Model and a Single Camera
by Yair González-Baldizón, Madaín Pérez-Patricio, Jorge Luis Camas-Anzueto, Oscar Mario Rodríguez-Elías, Elias Neftali Escobar-Gómez, Hector Daniel Vazquez-Delgado, Julio Alberto Guzman-Rabasa and José Armando Fragoso-Mandujano
Appl. Sci. 2022, 12(9), 4712; https://doi.org/10.3390/app12094712 - 7 May 2022
Cited by 6 | Viewed by 2701
Abstract
Object tracking is the process of estimating in time N the location of one or more moving element through an agent (camera, sensor, or other perceptive device). An important application in object tracking is the analysis of animal behavior to estimate their health. [...] Read more.
Object tracking is the process of estimating in time N the location of one or more moving element through an agent (camera, sensor, or other perceptive device). An important application in object tracking is the analysis of animal behavior to estimate their health. Traditionally, experts in the field have performed this task. However, this approach requires a high level of knowledge in the area and sufficient employees to ensure monitoring quality. Another alternative is the application of sensors (inertial and thermal), which provides precise information to the user, such as location and temperature, among other data. Nevertheless, this type of analysis results in high infrastructure costs and constant maintenance. Another option to overcome these problems is to analyze an RGB image to obtain information from animal tracking. This alternative eliminates the reliance on experts and different sensors, yet it adds the challenge of interpreting image ambiguity correctly. Taking into consideration the aforementioned, this article proposes a methodology to analyze lamb behavior from an approach based on a predictive model and deep learning, using a single RGB camera. This method consists of two stages. First, an architecture for lamb tracking was designed and implemented using CNN. Second, a predictive model was designed for the recognition of animal behavior. The results obtained in this research indicate that the proposed methodology is feasible and promising. In this sense, according to the experimental results on the used dataset, the accuracy was 99.85% for detecting lamb activities with YOLOV4, and for the proposed predictive model, a mean accuracy was 83.52% for detecting abnormal states. These results suggest that the proposed methodology can be useful in precision agriculture in order to take preventive actions and to diagnose possible diseases or health problems. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

20 pages, 5126 KiB  
Article
Driving Fatigue Detection Based on the Combination of Multi-Branch 3D-CNN and Attention Mechanism
by Wenbin Xiang, Xuncheng Wu, Chuanchang Li, Weiwei Zhang and Feiyang Li
Appl. Sci. 2022, 12(9), 4689; https://doi.org/10.3390/app12094689 - 6 May 2022
Cited by 10 | Viewed by 2706
Abstract
Fatigue driving is one of the main causes of traffic accidents today. In this study, a fatigue driving detection system based on a 3D convolutional neural network combined with a channel attention mechanism (Squeeze-and-Excitation module) is proposed. The model obtains information of multiple [...] Read more.
Fatigue driving is one of the main causes of traffic accidents today. In this study, a fatigue driving detection system based on a 3D convolutional neural network combined with a channel attention mechanism (Squeeze-and-Excitation module) is proposed. The model obtains information of multiple channels of grayscale, gradient and optical flow from the input frame. The temporal and spatial information contained in the feature map is extracted by three-dimensional convolution, after which the feature map is fed to the attention mechanism module to optimize the feature weights. EAR and MAR are used as fatigue analysis criteria and, finally, a full binary tree SVM classifier is used to output the four driving states. In addition, this study uses the frame aggregation strategy to solve the frame loss problem well and designs application software to record the driver’s status in real time while protecting the driver’s facial privacy and security. Compared with other classical fatigue driving detection methods, this method extracts features from temporal and spatial dimensions and optimizes the feature weights using the attention mechanism module, which significantly improves the fatigue detection performance. The experimental results show that 95% discriminative accuracy is achieved on the FDF dataset, which can be effectively applied to driving fatigue detection. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

22 pages, 4508 KiB  
Article
A Framework for Short Video Recognition Based on Motion Estimation and Feature Curves on SPD Manifolds
by Xiaohe Liu, Shuyu Liu and Zhengming Ma
Appl. Sci. 2022, 12(9), 4669; https://doi.org/10.3390/app12094669 - 6 May 2022
Cited by 4 | Viewed by 1896
Abstract
Given the prosperity of video media such as TikTok and YouTube, the requirement of short video recognition is becoming more and more urgent. A significant feature of short video is that there are few switches of scenes in short video, and the target [...] Read more.
Given the prosperity of video media such as TikTok and YouTube, the requirement of short video recognition is becoming more and more urgent. A significant feature of short video is that there are few switches of scenes in short video, and the target (e.g., the face of the key person in the short video) often runs through the short video. This paper presents a new short video recognition algorithm framework that transforms a short video into a family of feature curves on symmetric positive definite (SPD) manifold as the basis of recognition. Thus far, no similar algorithm has been reported. The results of experiments suggest that our method performs better on three changeling databases than seven other related algorithms published in the top issues. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

23 pages, 3130 KiB  
Article
ODEM-GAN: An Object Deformation Enhancement Model Based on Generative Adversarial Networks
by Zeyang Zhang, Zhongcai Pei, Zhiyong Tang and Fei Gu
Appl. Sci. 2022, 12(9), 4609; https://doi.org/10.3390/app12094609 - 3 May 2022
Cited by 2 | Viewed by 1944
Abstract
Object detection has attracted great attention in recent years. Many experts and scholars have proposed efficient solutions to address object detection problems and achieve perfect performance. For example, coordinate-based anchor-free (CBAF) module was proposed recently to predict the category and the adjustments to [...] Read more.
Object detection has attracted great attention in recent years. Many experts and scholars have proposed efficient solutions to address object detection problems and achieve perfect performance. For example, coordinate-based anchor-free (CBAF) module was proposed recently to predict the category and the adjustments to the box of the object by its feature part and its contextual part features, which are based on feature maps divided by spatial coordinates. However, these methods do not work very well for some particular situations (e.g., small object detection, scale variation, deformations, etc.), and the accuracy of object detection still needs to be improved. In this paper, to address these problems, we proposed ODEM-GAN based on CBAF, which utilizes generative adversarial networks to implement the detection of a deformed object. Specifically, ODEM-GAN first generates the object deformation features and then uses these features to enhance the learning ability of CBFA for improving the robustness of the detection. We also conducted extensive experiments to validate the effectiveness of ODEM-GAN in the simulation of a parachute opening process. The experimental results demonstrate that, with the assistance of ODEM-GAN, the AP score of CBAF for parachute detection is 88.4%, thereby the accuracy of detecting the deformed object by CBAF significantly increases. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 3478 KiB  
Article
Image Classification of Pests with Residual Neural Network Based on Transfer Learning
by Chen Li, Tong Zhen and Zhihui Li
Appl. Sci. 2022, 12(9), 4356; https://doi.org/10.3390/app12094356 - 25 Apr 2022
Cited by 41 | Viewed by 3908
Abstract
Agriculture is regarded as one of the key food sources for humans throughout history. In some countries, more than 90% of the population lives on agriculture. However, pests are regarded as one of the major causes of crop loss worldwide. Accurate and automated [...] Read more.
Agriculture is regarded as one of the key food sources for humans throughout history. In some countries, more than 90% of the population lives on agriculture. However, pests are regarded as one of the major causes of crop loss worldwide. Accurate and automated technology to classify pests can help pest detection with great significance for early preventive measures. This paper proposes the solution of a residual convolutional neural network for pest identification based on transfer learning. The IP102 agricultural pest image dataset was adopted as the experimental dataset to achieve data augmentation through random cropping, color transformation, CutMix and other operations. The processing technology can bring strong robustness to the affecting factors such as shooting angles, light and color changes. The experiment in this study compared the ResNeXt-50 (32 × 4d) model in terms of classification accuracy with different combinations of learning rate, transfer learning and data augmentation. In addition, the experiment compared the effects of data enhancement on the classification performance of different samples. The results show that the model classification effect based on transfer learning is generally superior to that based on new learning. Compared with new learning, transfer learning can greatly improve the model recognition ability and significantly reduce the training time to achieve the same classification accuracy. It is also very important to choose the appropriate data augmentation technology to improve classification accuracy. The accuracy rate of classification can reach 86.95% based on the combination of transfer learning + fine-tuning and CutMix. Compared to the original model, the accuracy of classification of some smaller samples was significantly improved. Compared with the relevant studies based on the same dataset, the method in this paper can achieve higher classification accuracy for more effective application in the field of pest classification. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

24 pages, 3689 KiB  
Article
PCB Component Detection Using Computer Vision for Hardware Assurance
by Wenwei Zhao, Suprith Reddy Gurudu, Shayan Taheri, Shajib Ghosh, Mukhil Azhagan Mallaiyan Sathiaseelan and Navid Asadizanjani
Big Data Cogn. Comput. 2022, 6(2), 39; https://doi.org/10.3390/bdcc6020039 - 8 Apr 2022
Cited by 18 | Viewed by 7555
Abstract
Printed circuit board (PCB) assurance in the optical domain is a crucial field of study. Though there are many existing PCB assurance methods using image processing, computer vision (CV), and machine learning (ML), the PCB field is complex and increasingly evolving, so new [...] Read more.
Printed circuit board (PCB) assurance in the optical domain is a crucial field of study. Though there are many existing PCB assurance methods using image processing, computer vision (CV), and machine learning (ML), the PCB field is complex and increasingly evolving, so new techniques are required to overcome the emerging problems. Existing ML-based methods outperform traditional CV methods; however, they often require more data, have low explainability, and can be difficult to adapt when a new technology arises. To overcome these challenges, CV methods can be used in tandem with ML methods. In particular, human-interpretable CV algorithms such as those that extract color, shape, and texture features increase PCB assurance explainability. This allows for incorporation of prior knowledge, which effectively reduces the number of trainable ML parameters and, thus, the amount of data needed to achieve high accuracy when training or retraining an ML model. Hence, this study explores the benefits and limitations of a variety of common computer vision-based features for the task of PCB component detection. The study results indicate that color features demonstrate promising performance for PCB component detection. The purpose of this paper is to facilitate collaboration between the hardware assurance, computer vision, and machine learning communities. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3458 KiB  
Article
Comparison of Multilayer Neural Network Models in Terms of Success of Classifications Based on EmguCV, ML.NET and Tensorflow.Net
by Martin Magdin, Juraj Benc, Štefan Koprda, Zoltán Balogh and Daniel Tuček
Appl. Sci. 2022, 12(8), 3730; https://doi.org/10.3390/app12083730 - 7 Apr 2022
Cited by 5 | Viewed by 2293
Abstract
In this paper, we compare three different models of multilayer neural networks in terms of their success in the classification phase. These models were designed for EmguCV, ML.NET and Tensorflow.Net libraries, which are currently among the most widely used libraries in the implementation [...] Read more.
In this paper, we compare three different models of multilayer neural networks in terms of their success in the classification phase. These models were designed for EmguCV, ML.NET and Tensorflow.Net libraries, which are currently among the most widely used libraries in the implementation of an automatic recognition system. Using the EmguCV library, we achieved a success rate in the classification of human faces of 81.95% and with ML.NET, which was based on the pre-trained ResNet50 model using convolution layers, up to 91.15% accuracy. The result of the success of the classification process was influenced by the time required for training and also the time required for the classification itself. The Tensorflow.Net model did not show sufficient classification ability when classifying using vector distances; the highest success rate of classification was only 13.31%. Neural networks were trained on a dataset with 1454 photographs of faces involving 43 people. At a time when neural networks are becoming more and more used for applications of different natures, it is necessary to choose the right model in the classification process that will be able to achieve the required accuracy with the minimum time required for training. The application created by us allows the insertion of images and the creation of their own datasets, on the basis of which the user can train a model with its own parameters. Models can then be saved and integrated into other applications. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

11 pages, 1293 KiB  
Article
PIFNet: 3D Object Detection Using Joint Image and Point Cloud Features for Autonomous Driving
by Wenqi Zheng, Han Xie, Yunfan Chen, Jeongjin Roh and Hyunchul Shin
Appl. Sci. 2022, 12(7), 3686; https://doi.org/10.3390/app12073686 - 6 Apr 2022
Cited by 8 | Viewed by 3585
Abstract
Owing to its wide range of applications, 3D object detection has attracted increasing attention in computer vision tasks. Most existing 3D object detection methods are based on Lidar point cloud data. However, these methods have some limitations in localization consistency and classification confidence, [...] Read more.
Owing to its wide range of applications, 3D object detection has attracted increasing attention in computer vision tasks. Most existing 3D object detection methods are based on Lidar point cloud data. However, these methods have some limitations in localization consistency and classification confidence, due to the irregularity and sparsity of Light Detection and Ranging (LiDAR) point cloud data. Inspired by the complementary characteristics of Lidar and camera sensors, we propose a new end-to-end learnable framework named Point-Image Fusion Network (PIFNet) to integrate the LiDAR point cloud and camera images. To resolve the problem of inconsistency in the localization and classification, we designed an Encoder-Decoder Fusion (EDF) module to extract the image features effectively, while maintaining the fine-grained localization information of objects. Furthermore, a new effective fusion module is proposed to integrate the color and texture features from images and the depth information from the point cloud. This module can enhance the irregularity and sparsity problem of the point cloud features by capitalizing the fine-grained information from camera images. In PIFNet, each intermediate feature map is fed into the fusion module to be integrated with its corresponding point-wise features. Furthermore, point-wise features are used instead of voxel-wise features to reduce information loss. Extensive experiments using the KITTI dataset demonstrate the superiority of PIFNet over other state-of-the-art methods. Compared with several state-of-the-art methods, our approach outperformed by 1.97% in mean Average Precision (mAP) and by 2.86% in Average Precision (AP) for the hard cases on the KITTI 3D object detection benchmark. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

17 pages, 7933 KiB  
Article
3D Pose Recognition System of Dump Truck for Autonomous Excavator
by Ju-hwan Lee, Junesuk Lee and Soon-Yong Park
Appl. Sci. 2022, 12(7), 3471; https://doi.org/10.3390/app12073471 - 29 Mar 2022
Cited by 4 | Viewed by 3004
Abstract
The purpose of an excavator is to dig up materials and load them onto heavy-duty dump trucks. Typically, an excavator is positioned at the rear of the dump truck when loading. In order to automate this process, this paper proposes a system that [...] Read more.
The purpose of an excavator is to dig up materials and load them onto heavy-duty dump trucks. Typically, an excavator is positioned at the rear of the dump truck when loading. In order to automate this process, this paper proposes a system that employs a combined stereo camera and two LiDAR sensors to determine the three-dimensional (3D) position of the truck’s cargo box and to analyze its loading space. Sparse depth information acquired from the two LiDAR sensors is used to detect points on the door of the cargo box and establish the plane on its rear side. Dense depth information of the cargo box acquired from the stereo camera sensor is projected onto the plane of the box’s rear to estimate its initial 3D position. In the next step, the point cloud sampled along the long shaft of the edge of the cargo box is used as the input of the Iterative Closest Point algorithm to calculate a more accurate cargo box position. The data collected from the stereo camera are then used to determine the 3D position of the cargo box and provide an estimate of the volume of the load along with the 3D position of the loading space to the excavator. In order to demonstrate the efficiency of the proposed method, a mock-up of a heavy-duty truck cargo box was created, and the volume of the load in the cargo box was analyzed. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

17 pages, 411 KiB  
Article
An Oversampling Method for Class Imbalance Problems on Large Datasets
by Fredy Rodríguez-Torres, José F. Martínez-Trinidad and Jesús A. Carrasco-Ochoa
Appl. Sci. 2022, 12(7), 3424; https://doi.org/10.3390/app12073424 - 28 Mar 2022
Cited by 23 | Viewed by 3234
Abstract
Several oversampling methods have been proposed for solving the class imbalance problem. However, most of them require searching the k-nearest neighbors to generate synthetic objects. This requirement makes them time-consuming and therefore unsuitable for large datasets. In this paper, an oversampling method [...] Read more.
Several oversampling methods have been proposed for solving the class imbalance problem. However, most of them require searching the k-nearest neighbors to generate synthetic objects. This requirement makes them time-consuming and therefore unsuitable for large datasets. In this paper, an oversampling method for large class imbalance problems that do not require the k-nearest neighbors’ search is proposed. According to our experiments on large datasets with different sizes of imbalance, the proposed method is at least twice as fast as 8 the fastest method reported in the literature while obtaining similar oversampling quality. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 4466 KiB  
Article
BBRefinement: A Universal Scheme to Improve the Precision of Box Object Detectors
by Petr Hurtik, Marek Vajgl and David Hynar
Appl. Sci. 2022, 12(7), 3402; https://doi.org/10.3390/app12073402 - 27 Mar 2022
Viewed by 1905
Abstract
We present a conceptually simple yet powerful and general scheme for refining the predictions of bounding boxes produced by an arbitrary object detector. Our approach was trained separately on single objects extracted from ground truth labels. For inference, it can be coupled with [...] Read more.
We present a conceptually simple yet powerful and general scheme for refining the predictions of bounding boxes produced by an arbitrary object detector. Our approach was trained separately on single objects extracted from ground truth labels. For inference, it can be coupled with an arbitrary object detector to improve its precision. The method, called BBRefinement, uses a mixture of data consisting of the image crop of an object and the object’s class and center. Because BBRefinement works in a restricted domain, it does not have to be concerned with multiscale detection, recognition of the object’s class, computing confidence, or multiple detections. Thus, the training is much more effective. It results in the ability to improve the performance of SOTA architectures by up to two mAP points on the COCO dataset in the benchmark. The refinement process is fast; it adds 50–80 ms overhead to a standard detector using RTX2080; therefore, it can run in real time on standard hardware. Finally, we show that BBRefinement can also be applied to COCO’s ground truth labels to create new, more precise labels. The link to the source code is provided in the contribution. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

11 pages, 8924 KiB  
Article
Non-Maximum Suppression Performs Later in Multi-Object Tracking
by Hong Liang, Ting Wu, Qian Zhang and Hui Zhou
Appl. Sci. 2022, 12(7), 3334; https://doi.org/10.3390/app12073334 - 25 Mar 2022
Cited by 7 | Viewed by 3155
Abstract
Multi-object tracking aims to assign a uniform ID for the same target in continuous frames, which is widely used in autonomous driving, security monitoring, etc. In the previous work, the low-scoring box, which inevitably contained occluded target, was filtered by Non-Maximum Suppression (NMS) [...] Read more.
Multi-object tracking aims to assign a uniform ID for the same target in continuous frames, which is widely used in autonomous driving, security monitoring, etc. In the previous work, the low-scoring box, which inevitably contained occluded target, was filtered by Non-Maximum Suppression (NMS) in a detection stage with a confidence threshold. In order to track occluded target effectively, in this paper, we propose a method of NMS performing later. The NMS works in tracking rather than the detection stage. More candidate boxes that contain the occluded target are reserved for trajectory matching. In addition, unrelated boxes are discarded according to the Intersection over Union (IOU) between the predicted and detected box. Furthermore, an unsupervised pre-trained person re-identification (ReID) model is applied to improve the domain adaptability. In addition, the bicubic interpolation is used to increase the resolution of low-scoring boxes. Extensive experiments on the MOT17 and MOT20 datasets have proven the effectiveness of tracking occluded targets of the proposed method, which achieves an MOTA of 78.3%. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 2100 KiB  
Article
Feature Mining: A Novel Training Strategy for Convolutional Neural Network
by Tianshu Xie, Jiali Deng, Xuan Cheng, Minghui Liu, Xiaomin Wang and Ming Liu
Appl. Sci. 2022, 12(7), 3318; https://doi.org/10.3390/app12073318 - 24 Mar 2022
Cited by 5 | Viewed by 2396
Abstract
In this paper, we propose a novel training strategy named Feature Mining for convolutional neural networks (CNNs) that aims to strengthen the network’s learning of the local features. Through experiments, we found that different parts of the feature contain different semantics, while the [...] Read more.
In this paper, we propose a novel training strategy named Feature Mining for convolutional neural networks (CNNs) that aims to strengthen the network’s learning of the local features. Through experiments, we found that different parts of the feature contain different semantics, while the network will inevitably lose a large amount of local information during feedforward propagation. In order to enhance the learning of the local features, Feature Mining divides the complete feature into two complementary parts and reuses this divided feature to make the network capture different local information; we call the two steps Feature Segmentation and Feature Reusing. Feature Mining is a parameter-free method with a plug-and-play nature and can be applied to any CNN model. Extensive experiments demonstrated the wide applicability, versatility, and compatibility of our method. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 31420 KiB  
Article
Monocular Real Time Full Resolution Depth Estimation Arrangement with a Tunable Lens
by Ricardo Oliva-García, Sabato Ceruso, José G. Marichal-Hernández and José M. Rodriguez-Ramos
Appl. Sci. 2022, 12(6), 3141; https://doi.org/10.3390/app12063141 - 19 Mar 2022
Cited by 4 | Viewed by 4143
Abstract
This work introduces a real-time full-resolution depth estimation device, which allows integral displays to be fed with a real-time light-field. The core principle of the technique is a high-speed focal stack acquisition method combined with an efficient implementation of the depth estimation algorithm, [...] Read more.
This work introduces a real-time full-resolution depth estimation device, which allows integral displays to be fed with a real-time light-field. The core principle of the technique is a high-speed focal stack acquisition method combined with an efficient implementation of the depth estimation algorithm, allowing the generation of real time, high resolution depth maps. As the procedure does not depend on any custom hardware, if the requirements are met, the described method can turn any high speed camera into a 3D camera with true depth output. The concept was tested with an experimental setup consisting of an electronically variable focus lens, a high-speed camera, and a GPU for processing, plus a control board for lens and image sensor synchronization. The comparison with other state of the art algorithms shows our advantages in computational time and precision. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 3589 KiB  
Article
Discrete HMM for Visualizing Domiciliary Human Activity Perception and Comprehension
by Ta-Wen Kuan, Shih-Pang Tseng, Che-Wen Chen, Jhing-Fa Wang and Chieh-An Sun
Appl. Sci. 2022, 12(6), 3070; https://doi.org/10.3390/app12063070 - 17 Mar 2022
Cited by 1 | Viewed by 1616
Abstract
Advances in artificial intelligence-based autonomous applications have led to the advent of domestic robots for smart elderly care; the preliminary critical step for such robots involves increasing the comprehension of robotic visualizing of human activity recognition. In this paper, discrete hidden Markov models [...] Read more.
Advances in artificial intelligence-based autonomous applications have led to the advent of domestic robots for smart elderly care; the preliminary critical step for such robots involves increasing the comprehension of robotic visualizing of human activity recognition. In this paper, discrete hidden Markov models (D-HMMs) are used to investigate human activity recognition. Eleven daily home activities are recorded using a video camera with an RGB-D sensor to collect a dataset composed of 25 skeleton joints in a frame, wherein only 10 skeleton joints are utilized to efficiently perform human activity recognition. Features of the chosen ten skeleton joints are sequentially extracted in terms of pose sequences for a specific human activity, and then, processed through coordination transformation and vectorization into a codebook prior to the D-HMM for estimating the maximal posterior probability to predict the target. In the experiments, the confusion matrix is evaluated based on eleven human activities; furthermore, the extension criterion of the confusion matrix is also examined to verify the robustness of the proposed work. The novelty indicated D-HMM theory is not only promising in terms of speech signal processing but also is applicable to visual signal processing and applications. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 2590 KiB  
Article
Double Branch Attention Block for Discriminative Representation of Siamese Trackers
by Jiaqi Xi, Jin Yang, Xiaodong Chen, Yi Wang and Huaiyu Cai
Appl. Sci. 2022, 12(6), 2897; https://doi.org/10.3390/app12062897 - 11 Mar 2022
Cited by 1 | Viewed by 1670
Abstract
Siamese trackers have achieved a good balance between accuracy and efficiency in generic object tracking. However, background distractors cause side effects to the discriminative representation of the target. To suppress the sensitivity of trackers to background distractors, we propose a Double Branch Attention [...] Read more.
Siamese trackers have achieved a good balance between accuracy and efficiency in generic object tracking. However, background distractors cause side effects to the discriminative representation of the target. To suppress the sensitivity of trackers to background distractors, we propose a Double Branch Attention (DBA) block and a Siamese tracker equipped with the DBA block named DBA-Siam. First, the DBA block concatenates channels of multiple layers from two branches of the Siamese framework to obtain rich feature representation. Second, the channel attention is applied to the two concatenated feature blocks to enhance the robust features selectively, thus enhancing the ability to distinguish the target from the complex background. Finally, the DBA block collects the contextual relevance between the Siamese branches and adaptively encodes it into the feature weight of the detection branch for information compensation. Ablation experiments show that the proposed block can enhance the discriminative representation of the target and significantly improve the tracking performance. Results on two popular benchmarks show that DBA-Siam performs favorably against its counterparts. Compared with the advanced algorithm CSTNet, DBA-Siam improves the EAO by 18.9% on VOT2016. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

11 pages, 4887 KiB  
Article
No-Reference Image Quality Assessment Based on Image Multi-Scale Contour Prediction
by Fan Wang, Jia Chen, Haonan Zhong, Yibo Ai and Weidong Zhang
Appl. Sci. 2022, 12(6), 2833; https://doi.org/10.3390/app12062833 - 10 Mar 2022
Cited by 4 | Viewed by 2265
Abstract
Accurately assessing image quality is a challenging task, especially without a reference image. Currently, most of the no-reference image quality assessment methods still require reference images in the training stage, but reference images are usually not available in real scenes. In this paper, [...] Read more.
Accurately assessing image quality is a challenging task, especially without a reference image. Currently, most of the no-reference image quality assessment methods still require reference images in the training stage, but reference images are usually not available in real scenes. In this paper, we proposed a model named MSIQA inspired by biological vision and a convolution neural network (CNN), which does not require reference images in the training and testing phases. The model contains two modules, a multi-scale contour prediction network that simulates the contour response of the human optic nerve to images at different distances, and a central attention peripheral inhibition module inspired by the receptive field mechanism of retinal ganglion cells. There are two steps in the training stage. In the first step, the multi-scale contour prediction network learns to predict the contour features of images in different scales, and in the second step, the model combines the central attention peripheral inhibition module to learn to predict the quality score of the image. In the experiments, our method has achieved excellent performance. The Pearson linear correlation coefficient of the MSIQA model test on the LIVE database reached 0.988. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

26 pages, 1487 KiB  
Review
An Overview on Deep Learning Techniques for Video Compressive Sensing
by Wael Saideni, David Helbert, Fabien Courreges and Jean-Pierre Cances
Appl. Sci. 2022, 12(5), 2734; https://doi.org/10.3390/app12052734 - 7 Mar 2022
Cited by 12 | Viewed by 4428
Abstract
The use of compressive sensing in several applications has allowed to capture impressive results, especially in various applications such as image and video processing and it has become a promising direction of scientific research. It provides extensive application value in optimizing video surveillance [...] Read more.
The use of compressive sensing in several applications has allowed to capture impressive results, especially in various applications such as image and video processing and it has become a promising direction of scientific research. It provides extensive application value in optimizing video surveillance networks. In this paper, we introduce recent state-of-the-art video compressive sensing methods based on neural networks and categorize them into different categories. We compare these approaches by analyzing the networks architectures. Then, we present their pros and cons. The general conclusion of the paper identify open research challenges and point out future research directions. The goal of this paper is to overview the current approaches in image and video compressive sensing and demonstrate their powerful impact in computer vision when using well designed compressive sensing algorithms. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

27 pages, 29095 KiB  
Article
Anthropometric Ratios for Lower-Body Detection Based on Deep Learning and Traditional Methods
by Jermphiphut Jaruenpunyasak, Alba García Seco de Herrera and Rakkrit Duangsoithong
Appl. Sci. 2022, 12(5), 2678; https://doi.org/10.3390/app12052678 - 4 Mar 2022
Cited by 1 | Viewed by 4434
Abstract
Lower-body detection can be useful in many applications, such as the detection of falling and injuries during exercises. However, it can be challenging to detect the lower-body, especially under various lighting and occlusion conditions. This paper presents a novel lower-body detection framework using [...] Read more.
Lower-body detection can be useful in many applications, such as the detection of falling and injuries during exercises. However, it can be challenging to detect the lower-body, especially under various lighting and occlusion conditions. This paper presents a novel lower-body detection framework using proposed anthropometric ratios and compares the performance of deep learning (convolutional neural networks and OpenPose) and traditional detection methods. According to the results, the proposed framework helps to successfully detect the accurate boundaries of the lower-body under various illumination and occlusion conditions for lower-limb monitoring. The proposed framework of anthropometric ratios combined with convolutional neural networks (A-CNNs) also achieves high accuracy (90.14%), while the combination of anthropometric ratios and traditional techniques (A-Traditional) for lower-body detection shows satisfactory performance with an averaged accuracy (74.81%). Although the accuracy of OpenPose (95.82%) is higher than the A-CNNs for lower-body detection, the A-CNNs provides lower complexity than the OpenPose, which is advantageous for lower-body detection and implementation on monitoring systems. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 7274 KiB  
Article
RoiSeg: An Effective Moving Object Segmentation Approach Based on Region-of-Interest with Unsupervised Learning
by Zeyang Zhang, Zhongcai Pei, Zhiyong Tang and Fei Gu
Appl. Sci. 2022, 12(5), 2674; https://doi.org/10.3390/app12052674 - 4 Mar 2022
Cited by 1 | Viewed by 1629
Abstract
Traditional video object segmentation often has low detection speed and inaccurate results due to the jitter caused by the pan-and-tilt or hand-held devices. Deep neural network (DNN) has been widely adopted to address these problems; however, it relies on a large number of [...] Read more.
Traditional video object segmentation often has low detection speed and inaccurate results due to the jitter caused by the pan-and-tilt or hand-held devices. Deep neural network (DNN) has been widely adopted to address these problems; however, it relies on a large number of annotated data and high-performance computing units. Therefore, DNN is not suitable for some special scenarios (e.g., no prior knowledge or powerful computing ability). In this paper, we propose RoiSeg, an effective moving object segmentation approach based on Region-of-Interest (ROI), which utilizes unsupervised learning method to achieve automatic segmentation of moving objects. Specifically, we first hypothesize that the central n × n pixels of images act as the ROI to represent the features of the segmented moving object. Second, we pool the ROI to a central point of the foreground to simplify the segmentation problem into a classification problem based on ROI. Third but not the least, we implement a trajectory-based classifier and an online updating mechanism to address the classification problem and the compensation of class imbalance, respectively. We conduct extensive experiments to evaluate the performance of RoiSeg and the experimental results demonstrate that RoiSeg is more accurate and faster compared with other segmentation algorithms. Moreover, RoiSeg not only effectively handles ambient lighting changes, fog, salt and pepper noise, but also has a good ability to deal with camera jitter and windy scenes. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 7920 KiB  
Article
LW-FIRE: A Lightweight Wildfire Image Classification with a Deep Convolutional Neural Network
by Amila Akagic and Emir Buza
Appl. Sci. 2022, 12(5), 2646; https://doi.org/10.3390/app12052646 - 4 Mar 2022
Cited by 16 | Viewed by 3199
Abstract
Analysis of reports published by the leading national centers for monitoring wildfires and other emergencies revealed that the devastation caused by wildfires has increased by 2.96-fold when compared to a decade earlier. The reports show that the total number of wildfires is declining; [...] Read more.
Analysis of reports published by the leading national centers for monitoring wildfires and other emergencies revealed that the devastation caused by wildfires has increased by 2.96-fold when compared to a decade earlier. The reports show that the total number of wildfires is declining; however, their impact on the wildlife appears to be more devastating. In recent years, deep neural network models have demonstrated state-of-the-art accuracy on many computer vision tasks. In this paper, we describe the design and implementation of a lightweight wildfire image classification model (LW-FIRE) based on convolutional neural networks. We explore different ways of using the existing dataset to efficiently train a deep convolutional neural network. We also propose a new method for dataset transformation to increase the number of samples in the dataset and improve the accuracy and generalization of the deep learning model. Experimental results show that the proposed model outperforms the state-of-the-art methods, and is suitable for real-time classification of wildfire images. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 8692 KiB  
Article
Data Extraction Method for Industrial Data Matrix Codes Based on Local Adjacent Modules Structure
by Licheng Liao, Jianmei Li and Changhou Lu
Appl. Sci. 2022, 12(5), 2291; https://doi.org/10.3390/app12052291 - 22 Feb 2022
Cited by 4 | Viewed by 4335
Abstract
A 2D barcode is a reliable way to provide lifetime traceability of parts that are exposed to harsh environments. However, there are considerable challenges in adopting mobile cameras to read symbols directly marked on metal surfaces. Images captured by mobile cameras are usually [...] Read more.
A 2D barcode is a reliable way to provide lifetime traceability of parts that are exposed to harsh environments. However, there are considerable challenges in adopting mobile cameras to read symbols directly marked on metal surfaces. Images captured by mobile cameras are usually of low quality with poor contrast due to the reflective surface of 2D barcode symbols. To deal with this problem, a novel laser-marked Data Matrix symbols reading method based on deep learning is proposed for mobile phone captured images. Utilizing the barcode module features, we train different convolutional neural network (CNN) models to learn the colors of two adjacent modules of a Data Matrix symbol. Depending on whether the colors of the two adjacent modules are the same or not, an edge image is transformed from a square grid, which is the same size as the barcode. A correction method based on the KM algorithm is used to get a corrected edge image, which helps to reconstruct the final barcode image. Experiments are carried out on our database, and the results show that the proposed algorithm outperforms in high accuracy of barcode recognition. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 10361 KiB  
Article
DeepProfile: Accurate Under-the-Clothes Body Profile Estimation
by Shufang Lu, Funan Lu, Xufeng Shou and Shuaiyin Zhu
Appl. Sci. 2022, 12(4), 2220; https://doi.org/10.3390/app12042220 - 21 Feb 2022
Viewed by 4150
Abstract
Accurate human body profiles have many potential applications. Image-based human body profile estimation can be regarded as a fine-grained semantic segmentation problem, which is typically used to locate objects and boundaries in images. However, existing image segmentation methods, such as human parsing, require [...] Read more.
Accurate human body profiles have many potential applications. Image-based human body profile estimation can be regarded as a fine-grained semantic segmentation problem, which is typically used to locate objects and boundaries in images. However, existing image segmentation methods, such as human parsing, require significant amounts of annotation and their datasets consider clothes as part of the human body profile. Therefore, the results they generate are not accurate when the human subject is dressed in loose-fitting clothing. In this paper, we created and labeled an under-the-clothes human body contour keypoint dataset; we utilized a convolutional neural network (CNN) to extract the contour keypoints, then combined them with a body profile database to generate under-the-clothes profiles. In order to improve the precision of keypoint detection, we propose a short-skip multi-scale dense (SMSD) block in the CNN to keep the details of the image and increase the information flow among different layers. Extensive experiments were conducted to show the effectiveness of our method. We demonstrate that our method achieved better results—especially when the person was dressed in loose-fitting clothes—than and competitive quantitative performance compared to state-of-the-art methods, while requiring less annotation effort. We also extended our method to the applications of 3D human model reconstruction and body size measurement. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 15516 KiB  
Article
Towards an Approach for Filtration Efficiency Estimation of Consumer-Grade Face Masks Using Thermography
by José Armando Fragoso-Mandujano, Madain Pérez-Patricio, Jorge Luis Camas-Anzueto, Hector Daniel Vázquez-Delgado, Eduardo Chandomí-Castellanos, Yair Gonzalez-Baldizón, Julio Alberto Guzman-Rabasa, Julio Cesar Martinez-Morgan and Luis Enrique Guillén-Ruíz
Appl. Sci. 2022, 12(4), 2071; https://doi.org/10.3390/app12042071 - 16 Feb 2022
Cited by 2 | Viewed by 2713
Abstract
Due to the increasing need for continuous use of face masks caused by COVID-19, it is essential to evaluate the filtration quality that each face mask provides. In this research, an estimation method based on thermal image processing was developed; the main objective [...] Read more.
Due to the increasing need for continuous use of face masks caused by COVID-19, it is essential to evaluate the filtration quality that each face mask provides. In this research, an estimation method based on thermal image processing was developed; the main objective was to evaluate the effectiveness of different face masks while being used during breathing. For the acquisition of heat distribution images, a thermographic imaging system was built; moreover, a deep learning model detected the leakage percentage of each face mask with a mAP of 0.9345, recall of 0.842 and F1-score of 0.82. The results obtained from this research revealed that the filtration effectiveness depended on heat loss through the manufacturing material; the proposed estimation method is simple, fast, and can be replicated and operated by people who are not experts in the computer field. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 3199 KiB  
Article
Chainlet-Based Ear Recognition Using Image Multi-Banding and Support Vector Machine
by Matthew Martin Zarachoff, Akbar Sheikh-Akbari and Dorothy Monekosso
Appl. Sci. 2022, 12(4), 2033; https://doi.org/10.3390/app12042033 - 16 Feb 2022
Cited by 3 | Viewed by 1867
Abstract
This paper introduces the Chainlet-based Ear Recognition algorithm using Multi-Banding and Support Vector Machine (CERMB-SVM). The proposed technique splits the gray input image into several bands based on the intensity of its pixels, similar to a hyperspectral image. It performs Canny edge detection [...] Read more.
This paper introduces the Chainlet-based Ear Recognition algorithm using Multi-Banding and Support Vector Machine (CERMB-SVM). The proposed technique splits the gray input image into several bands based on the intensity of its pixels, similar to a hyperspectral image. It performs Canny edge detection on each generated normalized band, extracting edges that correspond to the ear shape in each band. The generated binary edge maps are then combined, creating a single binary edge map. The resulting edge map is then divided into non-overlapping cells and the Freeman chain code for each group of connected edges within each cell is determined. A histogram of each group of contiguous four cells is computed, and the generated histograms are normalized and linked together to create a chainlet for the input image. The created chainlet histogram vectors of the images of the dataset are then utilized for the training and testing of a pairwise Support Vector Machine (SVM). Results obtained using the two benchmark ear image datasets demonstrate that the suggested CERMB-SVM method generates considerably higher performance in terms of accuracy than the principal component analysis based techniques. Furthermore, the proposed CERMB-SVM method yields greater performance in comparison to its anchor chainlet technique and state-of-the-art learning-based ear recognition techniques. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 2532 KiB  
Article
A Serial Attention Frame for Multi-Label Waste Bottle Classification
by Jingyu Xiao, Jiayu Xu, Chunwei Tian, Peiyi Han, Lei You and Shichao Zhang
Appl. Sci. 2022, 12(3), 1742; https://doi.org/10.3390/app12031742 - 8 Feb 2022
Cited by 13 | Viewed by 2734
Abstract
The multi-label recognition of damaged waste bottles has important significance in environmental protection. However, most of the previous methods are known for their poor performance, especially in regards to damaged waste bottle classification. In this paper, we propose the use of a serial [...] Read more.
The multi-label recognition of damaged waste bottles has important significance in environmental protection. However, most of the previous methods are known for their poor performance, especially in regards to damaged waste bottle classification. In this paper, we propose the use of a serial attention frame (SAF) to overcome the mentioned drawback. The proposed network architecture includes the following three parts: a residual learning block (RB), a mixed attention block (MAB), and a self-attention block (SAB). The RB uses ResNet to pretrain the SAF to extract more detailed information. To address the effect of the complex background of waste bottle recognition, a serial attention mechanism containing MAB and SAB is presented. MAB is used to extract more salient category information via the simultaneous use of spatial attention and channel attention. SAB exploits the obtained features and its parameters to enable the diverse features to improve the classification results of waste bottles. The experimental results demonstrate that our proposed model exhibited good recognition performance in the collected waste bottle datasets, with eight labels of three classifications, i.e., the color, whether the bottle was damage, and whether the wrapper had been removed, as well as public image classification datasets. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

10 pages, 1687 KiB  
Article
PTRNet: Global Feature and Local Feature Encoding for Point Cloud Registration
by Cuixia Li, Shanshan Yang, Li Shi, Yue Liu and Yinghao Li
Appl. Sci. 2022, 12(3), 1741; https://doi.org/10.3390/app12031741 - 8 Feb 2022
Cited by 4 | Viewed by 2795
Abstract
Existing end-to-end cloud registration methods are often inefficient and susceptible to noise. We propose an end-to-end point cloud registration network model, Point Transformer for Registration Network (PTRNet), that considers local and global features to improve this behavior. Our model uses point clouds as [...] Read more.
Existing end-to-end cloud registration methods are often inefficient and susceptible to noise. We propose an end-to-end point cloud registration network model, Point Transformer for Registration Network (PTRNet), that considers local and global features to improve this behavior. Our model uses point clouds as inputs and applies a Transformer method to extract their global features. Using a K-Nearest Neighbor (K-NN) topology, our method then encodes the local features of a point cloud and integrates them with the global features to obtain the point cloud’s strong global features. Comparative experiments using the ModelNet40 data set show that our method offers better results than other methods, with a mean square error (MSE), root mean square error (RMSE), and mean absolute error (MAE) between the ground truth and predicted values lower than those of competing methods. In the case of multi-object class without noise, the rotation average absolute error of PTRNet is reduced to 1.601 degrees and the translation average absolute error is reduced to 0.005 units. Compared to other recent end-to-end registration methods and traditional point cloud registration methods, the PTRNet method has less error, higher registration accuracy, and better robustness. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

24 pages, 30859 KiB  
Article
Feature Transformation Framework for Enhancing Compactness and Separability of Data Points in Feature Space for Small Datasets
by Mahmoud Maher ElMorshedy, Radwa Fathalla and Yasser El-Sonbaty
Appl. Sci. 2022, 12(3), 1713; https://doi.org/10.3390/app12031713 - 7 Feb 2022
Cited by 4 | Viewed by 3296
Abstract
Compactness and separability of data points are two important properties that contribute to the accuracy of machine learning tasks such as classification and clustering. We propose a framework that enhances the goodness criteria of the two properties by transforming the data points to [...] Read more.
Compactness and separability of data points are two important properties that contribute to the accuracy of machine learning tasks such as classification and clustering. We propose a framework that enhances the goodness criteria of the two properties by transforming the data points to a subspace in the same feature space, where data points of the same class are most similar to each other. Most related research about feature engineering in the input data points space relies on manually specified transformation functions. In contrast, our work utilizes a fully automated pipeline, in which the transformation function is learnt via an autoencoder for extraction of latent representation and multi-layer perceptron (MLP) regressors for the feature mapping. We tested our framework on both standard small datasets and benchmark-simulated small datasets by taking small fractions of their samples for training. Our framework consistently produced the best results in all semi-supervised clustering experiments based on K-means and different seeding techniques, with regards to clustering metrics and execution time. In addition, it enhances the performance of linear support vector machine (LSVM) and artificial neural network (ANN) classifier, when embedded as a preprocessing step before applying the classifiers. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

21 pages, 2895 KiB  
Article
Which Features Are More Correlated to Illuminant Estimation: A Composite Substitute
by Yunhui Luo, Xingguang Wang and Qing Wang
Appl. Sci. 2022, 12(3), 1175; https://doi.org/10.3390/app12031175 - 23 Jan 2022
Cited by 1 | Viewed by 1670
Abstract
Computational color constancy (CCC) is to endow computers or cameras with the capability to remove the color bias effect caused by different scene illuminations. The first procedure of CCC is illuminant estimation, i.e., to calculate the illuminant color for a given image scene. [...] Read more.
Computational color constancy (CCC) is to endow computers or cameras with the capability to remove the color bias effect caused by different scene illuminations. The first procedure of CCC is illuminant estimation, i.e., to calculate the illuminant color for a given image scene. Recently, some methods directly mapping image features to illuminant estimation provide an effective and robust solution for this issue. Nevertheless, due to diverse image features, it is uncertain to select which features to model illuminant color. In this research, a series of artificial features weaved into a mapping-based illuminant estimation framework is extensively investigated. This framework employs a multi-model structure and integrates the functions of kernel-based fuzzy c-means (KFCM) clustering, non-negative least square regression (NLSR), and fuzzy weighting. By comparing the resulting performance of different features, the features more correlated to illuminant estimation are found in the candidate feature set. Furthermore, the composite features are designed to achieve the outstanding performances of illuminant estimation. Extensive experiments are performed on typical benchmark datasets and the effectiveness of the proposed method has been validated. The proposed method makes illuminant estimation an explicit transformation of suitable image features with regressed and fuzzy weights, which has significant potential for both competing performances and fast implementation against state-of-the-art methods. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 1465 KiB  
Article
Automated Recognition of Chemical Molecule Images Based on an Improved TNT Model
by Yanchi Li, Guanyu Chen and Xiang Li
Appl. Sci. 2022, 12(2), 680; https://doi.org/10.3390/app12020680 - 11 Jan 2022
Cited by 3 | Viewed by 2511
Abstract
The automated recognition of optical chemical structures, with the help of machine learning, could speed up research and development efforts. However, historical sources often have some level of image corruption, which reduces the performance to near zero. To solve this downside, we need [...] Read more.
The automated recognition of optical chemical structures, with the help of machine learning, could speed up research and development efforts. However, historical sources often have some level of image corruption, which reduces the performance to near zero. To solve this downside, we need a dependable algorithmic program to help chemists to further expand their research. This paper reports the results of research conducted for the Bristol-Myers Squibb-Molecular Translation competition, which was held on Kaggle and which invited participants to convert old chemical images to their underlying chemical structures, annotated as InChI text; we define this work as molecular translation. We proposed a model based on a transformer, which can be utilized in molecular translation. To better capture the details of the chemical structure, the image features we want to extract need to be accurate at the pixel level. TNT is one of the existing transformer models that can meet this requirement. This model was originally used for image classification, and is essentially a transformer-encoder, which cannot be utilized for generation tasks. On the other hand, we believe that TNT cannot integrate the local information of images well, so we improve the core module of TNT—TNT block—and propose a novel module—Deep TNT block—by stacking the module to form an encoder structure, and then use the vanilla transformer-decoder as a decoder, forming a chemical formula generation model based on the encoder–decoder structure. Since molecular translation is an image-captioning task, we named it the Image Captioning Model based on Deep TNT (ICMDT). A comparison with different models shows that our model has benefits in each convergence speed and final description accuracy. We have designed a complete process in the model inference and fusion phase to further enhance the final results. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 2851 KiB  
Communication
A Novel, Automated, and Real-Time Method for the Analysis of Non-Human Primate Behavioral Patterns Using a Depth Image Sensor
by Sang Kuy Han, Keonwoo Kim, Yejoon Rim, Manhyung Han, Youngjeon Lee, Sung-Hyun Park, Won Seok Choi, Keyoung Jin Chun and Dong-Seok Lee
Appl. Sci. 2022, 12(1), 471; https://doi.org/10.3390/app12010471 - 4 Jan 2022
Cited by 1 | Viewed by 2143
Abstract
By virtue of their upright locomotion, similar to that of humans, motion analysis of non-human primates has been widely used in order to better understand musculoskeletal biomechanics and neuroscience problems. Given the difficulty of conducting a marker-based infrared optical tracking system for the [...] Read more.
By virtue of their upright locomotion, similar to that of humans, motion analysis of non-human primates has been widely used in order to better understand musculoskeletal biomechanics and neuroscience problems. Given the difficulty of conducting a marker-based infrared optical tracking system for the behavior analysis of primates, a 2-dimensional (D) video analysis has been applied. Distinct from a conventional marker-based optical tracking system, a depth image sensor system provides 3-D information on movement without any skin markers. The specific aim of this study was to develop a novel algorithm to analyze the behavioral patterns of non-human primates in a home cage using a depth image sensor. The behavioral patterns of nine monkeys in their home cage, including sitting, standing, and pacing, were captured using a depth image sensor. Thereafter, these were analyzed by observers’ manual assessment and the newly written automated program. We confirmed that the measurement results from the observers’ manual assessments and the automated program with depth image analysis were statistically identical. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

11 pages, 2266 KiB  
Communication
An Algorithm for Obtaining 3D Egg Models from Visual Images
by Zlatin Zlatev, Mariya Georgieva-Nikolova and Hristo Lukanov
Appl. Sci. 2022, 12(1), 373; https://doi.org/10.3390/app12010373 - 31 Dec 2021
Cited by 2 | Viewed by 2189
Abstract
Mathematical models for describing the shape of eggs find application in various fields of practice. The article proposes a method and tools for a detailed study of the shape and peripheral contours of digital images of eggs that are suitable for grouping and [...] Read more.
Mathematical models for describing the shape of eggs find application in various fields of practice. The article proposes a method and tools for a detailed study of the shape and peripheral contours of digital images of eggs that are suitable for grouping and sorting. A scheme has been adapted to determine the morphological characteristics of eggs, on the basis of which an algorithm has been created for obtaining their 3D models, based on data from color digital images. The deviation from the dimensions of the major and minor axes measured with a caliper and the proposed algorithm is 0.5–1.5 mm. A model of a correction factor has been established by which the three-dimensional shape of eggs can be determined with sufficient accuracy. The results obtained in this work improve the assumption that the use of algorithms to determine the shape of eggs strongly depends on those of the bird species studied. It is approved with data for Mallard eggs which have a more elliptical shape and correspondingly lower values of correction coefficient ‘c’ (c = 1.55–4.96). In sparrow (c = 9.55–11.19) and quail (c = 11.71–13.11) eggs, the form tends to be ovoid. After testing the obtained model for eggs from three bird species, sparrow, mallard, and quail, the coefficient of the determination of proposed model was R2 = 0.96. The standard error was SE = 0.08. All of the results show a p-value of the model less than α = 0.05. The proposed algorithm was applied to create 3D egg shapes that were not used in the previous calculations. The resulting error was up to 9%. This shows that in the test, the algorithm had an accuracy of 91%. An advantage of the algorithm proposed here is that the human operator does not need to select points in the image, as is the case with some of the algorithms developed by other authors. The proposed methods and tools for three-dimensional transformation of egg images would be applicable not only for the needs of poultry farming, but also in ornithological research when working with different shaped varieties of eggs. Experimental results show that the proposed algorithm has sufficient accuracy. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Graphical abstract

14 pages, 4865 KiB  
Article
A Self-Attention Augmented Graph Convolutional Clustering Networks for Skeleton-Based Video Anomaly Behavior Detection
by Chengming Liu, Ronghua Fu, Yinghao Li, Yufei Gao, Lei Shi and Weiwei Li
Appl. Sci. 2022, 12(1), 4; https://doi.org/10.3390/app12010004 - 21 Dec 2021
Cited by 16 | Viewed by 3795
Abstract
In this paper, we propose a new method for detecting abnormal human behavior based on skeleton features using self-attention augment graph convolution. The skeleton data have been proved to be robust to the complex background, illumination changes, and dynamic camera scenes and are [...] Read more.
In this paper, we propose a new method for detecting abnormal human behavior based on skeleton features using self-attention augment graph convolution. The skeleton data have been proved to be robust to the complex background, illumination changes, and dynamic camera scenes and are naturally constructed as a graph in non-Euclidean space. Particularly, the establishment of spatial temporal graph convolutional networks (ST-GCN) can effectively learn the spatio-temporal relationships of Non-Euclidean Structure Data. However, it only operates on local neighborhood nodes and thereby lacks global information. We propose a novel spatial temporal self-attention augmented graph convolutional networks (SAA-Graph) by combining improved spatial graph convolution operator with a modified transformer self-attention operator to capture both local and global information of the joints. The spatial self-attention augmented module is used to understand the intra-frame relationships between human body parts. As far as we know, we are the first group to utilize self-attention for video anomaly detection tasks by enhancing spatial temporal graph convolution. Moreover, to validate the proposed model, we performed extensive experiments on two large-scale publicly standard datasets (i.e., ShanghaiTech Campus and CUHK Avenue datasets) which reveal the state-of-art performance for our proposed approach when compared to existing skeleton-based methods and graph convolution methods. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

25 pages, 1242 KiB  
Article
Automated Event Detection and Classification in Soccer: The Potential of Using Multiple Modalities
by Olav Andre Nergård Rongved, Markus Stige, Steven Alexander Hicks, Vajira Lasantha Thambawita, Cise Midoglu, Evi Zouganeli, Dag Johansen, Michael Alexander Riegler and Pål Halvorsen
Mach. Learn. Knowl. Extr. 2021, 3(4), 1030-1054; https://doi.org/10.3390/make3040051 - 16 Dec 2021
Cited by 20 | Viewed by 6112
Abstract
Detecting events in videos is a complex task, and many different approaches, aimed at a large variety of use-cases, have been proposed in the literature. Most approaches, however, are unimodal and only consider the visual information in the videos. This paper presents and [...] Read more.
Detecting events in videos is a complex task, and many different approaches, aimed at a large variety of use-cases, have been proposed in the literature. Most approaches, however, are unimodal and only consider the visual information in the videos. This paper presents and evaluates different approaches based on neural networks where we combine visual features with audio features to detect (spot) and classify events in soccer videos. We employ model fusion to combine different modalities such as video and audio, and test these combinations against different state-of-the-art models on the SoccerNet dataset. The results show that a multimodal approach is beneficial. We also analyze how the tolerance for delays in classification and spotting time, and the tolerance for prediction accuracy, influence the results. Our experiments show that using multiple modalities improves event detection performance for certain types of events. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 634 KiB  
Article
Time Classification Algorithm Based on Windowed-Color Histogram Matching
by Hye-Jin Park, Jung-In Jang and Byung-Gyu Kim
Appl. Sci. 2021, 11(24), 11997; https://doi.org/10.3390/app112411997 - 16 Dec 2021
Cited by 2 | Viewed by 2723
Abstract
A web-based search system recommends and gives results such as customized image or video contents using information such as user interests, search time, and place. Time information extracted from images can be used as a important metadata in the web search system. We [...] Read more.
A web-based search system recommends and gives results such as customized image or video contents using information such as user interests, search time, and place. Time information extracted from images can be used as a important metadata in the web search system. We present an efficient algorithm to classify time period into day, dawn, and night when the input is a single image with a sky region. We employ the Mask R-CNN to extract a sky region. Based on the extracted sky region, reference color histograms are generated, which can be considered as the ground-truth. To compare the histograms effectively, we design the windowed-color histograms (for RGB bands) to compare each time period from the sky region of the reference data with one of the input images. Also, we use a weighting approach to reflect a more separable feature on the windowed-color histogram. With the proposed windowed-color histogram, we verify about 91% of the recognition accuracy in the test data. Compared with the existing deep neural network models, we verify that the proposed algorithm achieves better performance in the test dataset. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 92045 KiB  
Article
Cooperative Visual Augmentation Algorithm of Intelligent Vehicle Based on Inter-Vehicle Image Fusion
by Wei Liu, Yun Ma, Mingqiang Gao, Shuaidong Duan and Longsheng Wei
Appl. Sci. 2021, 11(24), 11917; https://doi.org/10.3390/app112411917 - 15 Dec 2021
Cited by 4 | Viewed by 2407
Abstract
In a connected vehicle environment based on vehicle-to-vehicle (V2V) technology, images from front and ego vehicles are fused to augment a driver’s or autonomous system’s visual field, which is helpful in avoiding road accidents by eliminating the blind point (the objects occluded by [...] Read more.
In a connected vehicle environment based on vehicle-to-vehicle (V2V) technology, images from front and ego vehicles are fused to augment a driver’s or autonomous system’s visual field, which is helpful in avoiding road accidents by eliminating the blind point (the objects occluded by vehicles), especially tailgating in urban areas. Realizing multi-view image fusion is a tough problem without knowing the relative location of two sensors and the fusing object is occluded in some views. Therefore, we propose an image geometric projection model and a new fusion method between neighbor vehicles in a cooperative way. Based on a 3D inter-vehicle projection model, selected feature matching points are adopted to estimate the geometric transformation parameters. By adding deep information, our method also designs a new deep-affine transformation to realize fusing of inter-vehicle images. Experimental results on KIITI (Karlsruhe Institute of Technology and Toyota Technological Institute) datasets are shown to validate our algorithm. Compared with previous work, our method improves the IoU index by 2~3 times. This algorithm can effectively enhance the visual perception ability of intelligent vehicles, and it will help to promote the further development and improvement of computer vision technology in the field of cooperative perception. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 4788 KiB  
Article
AI-Based Video Clipping of Soccer Events
by Joakim Olav Valand, Haris Kadragic, Steven Alexander Hicks, Vajira Lasantha Thambawita, Cise Midoglu, Tomas Kupka, Dag Johansen, Michael Alexander Riegler and Pål Halvorsen
Mach. Learn. Knowl. Extr. 2021, 3(4), 990-1008; https://doi.org/10.3390/make3040049 - 8 Dec 2021
Cited by 7 | Viewed by 6183
Abstract
The current gold standard for extracting highlight clips from soccer games is the use of manual annotations and clippings, where human operators define the start and end of an event and trim away the unwanted scenes. This is a tedious, time-consuming, and expensive [...] Read more.
The current gold standard for extracting highlight clips from soccer games is the use of manual annotations and clippings, where human operators define the start and end of an event and trim away the unwanted scenes. This is a tedious, time-consuming, and expensive task, to the extent of being rendered infeasible for use in lower league games. In this paper, we aim to automate the process of highlight generation using logo transition detection, scene boundary detection, and optional scene removal. We experiment with various approaches, using different neural network architectures on different datasets, and present two models that automatically find the appropriate time interval for extracting goal events. These models are evaluated both quantitatively and qualitatively, and the results show that we can detect logo and scene transitions with high accuracy and generate highlight clips that are highly acceptable for viewers. We conclude that there is considerable potential in automating the overall soccer video clipping process. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 5987 KiB  
Article
Memory-Efficient AI Algorithm for Infant Sleeping Death Syndrome Detection in Smart Buildings
by Qian Huang, Chenghung Hsieh, Jiaen Hsieh and Chunchen Liu
AI 2021, 2(4), 705-719; https://doi.org/10.3390/ai2040042 - 8 Dec 2021
Cited by 4 | Viewed by 4734
Abstract
Artificial intelligence (AI) is fundamentally transforming smart buildings by increasing energy efficiency and operational productivity, improving life experience, and providing better healthcare services. Sudden Infant Death Syndrome (SIDS) is an unexpected and unexplained death of infants under one year old. Previous research reports [...] Read more.
Artificial intelligence (AI) is fundamentally transforming smart buildings by increasing energy efficiency and operational productivity, improving life experience, and providing better healthcare services. Sudden Infant Death Syndrome (SIDS) is an unexpected and unexplained death of infants under one year old. Previous research reports that sleeping on the back can significantly reduce the risk of SIDS. Existing sensor-based wearable or touchable monitors have serious drawbacks such as inconvenience and false alarm, so they are not attractive in monitoring infant sleeping postures. Several recent studies use a camera, portable electronics, and AI algorithm to monitor the sleep postures of infants. However, there are two major bottlenecks that prevent AI from detecting potential baby sleeping hazards in smart buildings. In order to overcome these bottlenecks, in this work, we create a complete dataset containing 10,240 day and night vision samples, and use post-training weight quantization to solve the huge memory demand problem. Experimental results verify the effectiveness and benefits of our proposed idea. Compared with the state-of-the-art AI algorithms in the literature, the proposed method reduces memory footprint by at least 89%, while achieving a similar high detection accuracy of about 90%. Our proposed AI algorithm only requires 6.4 MB of memory space, while other existing AI algorithms for sleep posture detection require 58.2 MB to 275 MB of memory space. This comparison shows that the memory is reduced by at least 9 times without sacrificing the detection accuracy. Therefore, our proposed memory-efficient AI algorithm has great potential to be deployed and to run on edge devices, such as micro-controllers and Raspberry Pi, which have low memory footprint, limited power budget, and constrained computing resources. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 616 KiB  
Article
EnCaps: Clothing Image Classification Based on Enhanced Capsule Network
by Feng Yu, Chenghu Du, Ailing Hua, Minghua Jiang, Xiong Wei, Tao Peng and Xinrong Hu
Appl. Sci. 2021, 11(22), 11024; https://doi.org/10.3390/app112211024 - 21 Nov 2021
Cited by 5 | Viewed by 2385
Abstract
Clothing image classification is more and more important in the development of online clothing shopping. The clothing category marking, clothing commodity retrieval, and similar clothing recommendations are the popular applications in current clothing shopping, which are based on the technology of accurate clothing [...] Read more.
Clothing image classification is more and more important in the development of online clothing shopping. The clothing category marking, clothing commodity retrieval, and similar clothing recommendations are the popular applications in current clothing shopping, which are based on the technology of accurate clothing image classification. Wide varieties and various styles of clothing lead to great difficulty for the accurate clothing image classification. The traditional neural network can not obtain the spatial structure information of clothing images, which leads to poor classification accuracy. In order to reach the high accuracy, the enhanced capsule (EnCaps) network is proposed with the image feature and spatial structure feature. First, the spatial structure extraction model is proposed to obtain the clothing structure feature based on the EnCaps network. Second, the enhanced feature extraction model is proposed to extract more robust clothing features based on deeper network structure and attention mechanism. Third, parameter optimization is used to reduce the computation in the proposed network based on inception mechanism. Experimental results indicate that the proposed EnCaps network achieves high performance in terms of classification accuracy and computational efficiency. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 1417 KiB  
Article
Spatio-Temporal Deep Learning-Based Methods for Defect Detection: An Industrial Application Study Case
by Lucas A. da Silva, Eulanda M. dos Santos, Leo Araújo, Natalia S. Freire, Max Vasconcelos, Rafael Giusti, David Ferreira, Anderson S. Jesus, Agemilson Pimentel, Caio F. S. Cruz, Ruan J. S. Belem, André S. Costa and Osmar A. da Silva
Appl. Sci. 2021, 11(22), 10861; https://doi.org/10.3390/app112210861 - 17 Nov 2021
Cited by 2 | Viewed by 2730
Abstract
Data-driven methods—particularly machine learning techniques—are expected to play a key role in the headway of Industry 4.0. One increasingly popular application in this context is when anomaly detection is employed to test manufactured goods in assembly lines. In this work, we compare supervised, [...] Read more.
Data-driven methods—particularly machine learning techniques—are expected to play a key role in the headway of Industry 4.0. One increasingly popular application in this context is when anomaly detection is employed to test manufactured goods in assembly lines. In this work, we compare supervised, semi/weakly-supervised, and unsupervised strategies to detect anomalous sequences in video samples which may be indicative of defective televisions assembled in a factory. We compare 3D autoencoders, convolutional neural networks, and generative adversarial networks (GANs) with data collected in a laboratory. Our methodology to simulate anomalies commonly found in TV devices is discussed in this paper. We also propose an approach to generate anomalous sequences similar to those produced by a defective device as part of our GAN approach. Our results show that autoencoders perform poorly when trained with only non-anomalous data—which is important because class imbalance in industrial applications is typically skewed towards the non-anomalous class. However, we show that fine-tuning the GAN is a feasible approach to overcome this problem, achieving results comparable to those of supervised methods. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 41848 KiB  
Article
Active Sonar Target Classification Method Based on Fisher’s Dictionary Learning
by Tongjing Sun, Jiwei Jin, Tong Liu and Jun Zhang
Appl. Sci. 2021, 11(22), 10635; https://doi.org/10.3390/app112210635 - 11 Nov 2021
Cited by 8 | Viewed by 1807
Abstract
The marine environment is complex and changeable, and the interference of noise and reverberation seriously affects the classification performance of active sonar equipment. In particular, when the targets to be measured have similar characteristics, underwater classification becomes more complex. Therefore, a strong, recognizable [...] Read more.
The marine environment is complex and changeable, and the interference of noise and reverberation seriously affects the classification performance of active sonar equipment. In particular, when the targets to be measured have similar characteristics, underwater classification becomes more complex. Therefore, a strong, recognizable algorithm needs to be developed that can handle similar feature targets in a reverberation environment. This paper combines Fisher’s discriminant criterion and a dictionary-learning-based sparse representation classification algorithm, and proposes an active sonar target classification method based on Fisher discriminant dictionary learning (FDDL). Based on the learning dictionaries, the proposed method introduces the Fisher restriction criterion to limit the sparse coefficients, thereby obtaining a more discriminating dictionary; finally, it distinguishes the category according to the reconstruction errors of the reconstructed signal and the signal to be measured. The classification performance is compared with the existing methods, such as SVM (Support Vector Machine), SRC (Sparse Representation Based Classification), D-KSVD (Discriminative K-Singular Value Decomposition), and LC-KSVD (label-consistent K-SVD), and the experimental results show that FDDL has a better classification performance than the existing classification methods. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 4882 KiB  
Article
Assembly Quality Detection Based on Class-Imbalanced Semi-Supervised Learning
by Zichen Lu, Jiabin Jiang, Pin Cao and Yongying Yang
Appl. Sci. 2021, 11(21), 10373; https://doi.org/10.3390/app112110373 - 4 Nov 2021
Cited by 4 | Viewed by 2090
Abstract
Due to the imperfect assembly process, the unqualified assembly of a missing gasket or lead seal will affect the product’s performance and possibly cause safety accidents. Machine vision method based on deep learning has been widely used in quality inspection. Semi-supervised learning (SSL) [...] Read more.
Due to the imperfect assembly process, the unqualified assembly of a missing gasket or lead seal will affect the product’s performance and possibly cause safety accidents. Machine vision method based on deep learning has been widely used in quality inspection. Semi-supervised learning (SSL) has been applied in training deep learning models to reduce the burden of data annotation. The dataset obtained from the production line tends to be class-imbalanced because the assemblies are qualified in most cases. However, most SSL methods suffer from lower performance in class-imbalanced datasets. Therefore, we propose a new semi-supervised algorithm that achieves high classification accuracy on the class-imbalanced assembly dataset with limited labeled data. Based on the mean teacher algorithm, the proposed algorithm uses certainty to select reliable teacher predictions for student learning dynamically, and loss functions are modified to improve the model’s robustness against class imbalance. Results show that when only 10% of the total data are labeled, and the imbalance rate is 5.3, the proposed method can improve the accuracy from 85.34% to 93.67% compared to supervised learning. When the amount of annotated data accounts for 20%, the accuracy can reach 98.83%. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

21 pages, 7862 KiB  
Article
Illuminant Estimation Using Adaptive Neuro-Fuzzy Inference System
by Yunhui Luo, Xingguang Wang, Qing Wang and Yehong Chen
Appl. Sci. 2021, 11(21), 9936; https://doi.org/10.3390/app11219936 - 25 Oct 2021
Cited by 1 | Viewed by 2363
Abstract
Computational color constancy (CCC) is a fundamental prerequisite for many computer vision tasks. The key of CCC is to estimate illuminant color so that the image of a scene under varying illumination can be normalized to an image under the canonical illumination. As [...] Read more.
Computational color constancy (CCC) is a fundamental prerequisite for many computer vision tasks. The key of CCC is to estimate illuminant color so that the image of a scene under varying illumination can be normalized to an image under the canonical illumination. As a type of solution, combination algorithms generally try to reach better illuminant estimation by weighting other unitary algorithms for a given image. However, due to the diversity of image features, applying the same weighting combination strategy to different images might result in unsound illuminant estimation. To address this problem, this study provides an effective option. A two-step strategy is first employed to cluster the training images, then for each cluster, ANFIS (adaptive neuro-network fuzzy inference system) models are effectively trained to map image features to illuminant color. While giving a test image, the fuzzy weights measuring what degrees the image belonging to each cluster are calculated, thus a reliable illuminant estimation will be obtained by weighting all ANFIS predictions. The proposed method allows illuminant estimation to be dynamic combinations of initial illumination estimates from some unitary algorithms, relying on the powerful learning and reasoning capabilities of ANFIS. Extensive experiments on typical benchmark datasets demonstrate the effectiveness of the proposed approach. In addition, although there is an initial observation that some learning-based methods outperform even the most carefully designed and tested combinations of statistical and fuzzy inference systems, the proposed method is good practice for illuminant estimation considering fuzzy inference eases to implement in imaging signal processors with if-then rules and low computation efforts. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 5611 KiB  
Article
How Character-Centric Game Icon Design Affects the Perception of Gameplay
by Xiaoxiao Cao, Makoto Watanabe and Kenta Ono
Appl. Sci. 2021, 11(21), 9911; https://doi.org/10.3390/app11219911 - 23 Oct 2021
Cited by 7 | Viewed by 3865
Abstract
Mobile games are developing rapidly as an important part of the national economy. Gameplay is an important attribute, and a game’s icon sometimes determines the user’s initial impression. Whether the user can accurately perceive gameplay and affective quality through the icon is particularly [...] Read more.
Mobile games are developing rapidly as an important part of the national economy. Gameplay is an important attribute, and a game’s icon sometimes determines the user’s initial impression. Whether the user can accurately perceive gameplay and affective quality through the icon is particularly critical. In this article, a two-stage perceptual matching procedure is used to evaluate the perceptual quality of six categories of games whose icons include characters as elements. First, 60 highly visual matching icons were selected as second-stage objects through classification tasks. Second, through the semantic differential method and correlation analysis, highly visual matching icons’ affective matching quality was measured. Finally, a series of icon samples were determined, and element analysis was carried out. Several methods were proposed for improving the perceptual quality of game icons. Studying the perceptual matching relationship can better enhance the interaction between designers, developers, and users. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 4612 KiB  
Article
Unsupervised Anomaly Approach to Pedestrian Age Classification from Surveillance Cameras Using an Adversarial Model with Skip-Connections
by Husnu Baris Baydargil, Jangsik Park and Ibrahim Furkan Ince
Appl. Sci. 2021, 11(21), 9904; https://doi.org/10.3390/app11219904 - 23 Oct 2021
Cited by 1 | Viewed by 2194
Abstract
Anomaly detection is an active research area within the machine learning and scene understanding fields. Despite the ambiguous definition, anomaly detection is considered an outlier detection in a given data based on normality constraints. The biggest problem in real-world anomaly detection applications is [...] Read more.
Anomaly detection is an active research area within the machine learning and scene understanding fields. Despite the ambiguous definition, anomaly detection is considered an outlier detection in a given data based on normality constraints. The biggest problem in real-world anomaly detection applications is the high bias of the available data due to the class imbalance, meaning a limited amount of all possible anomalous and normal samples, thus making supervised learning model use difficult. This paper introduces an unsupervised and adversarially trained anomaly model with a unique encoder–decoder structure to address this issue. The proposed model distinguishes different age groups of people—namely child, adult, and elderly—from surveillance camera data in Busan, Republic of Korea. The proposed model has three major parts: a parallel-pipeline encoder with a conventional convolutional neural network and a dilated-convolutional neural network. The latent space vectors created at the end of both networks are concatenated. While the convolutional pipeline extracts local features, the dilated convolutional pipeline extracts the global features from the same input image. Concatenation of these features is sent as the input into the decoder, which has partial skip-connection elements from both pipelines. This, along with the concatenated feature vector, improves feature diversity. The input image is reconstructed from the feature vector through the stacked transpose convolution layers. Afterward, both the original input image and the corresponding reconstructed image are sent into the discriminator and are distinguished as real or fake. The image reconstruction loss and its corresponding latent space loss are considered for the training of the model and the adversarial Wasserstein loss. Only normal-designated class images are used during the training. The hypothesis is that if the model is trained with normal class images, then during the inference, the construction loss will be minimal. On the other hand, if the untrained anomalous class images are input through the model, the reconstruction loss value will be very high. This method is applied to distinguish different age clusters of people using unsupervised training. The proposed model outperforms the benchmark models in both the qualitative and the quantitative measurements. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 3776 KiB  
Article
The Influence of Commodity Presentation Mode on Online Shopping Decision Preference Induced by the Serial Position Effect
by Zhiman Zhu, Ningyue Peng, Yafeng Niu, Haiyan Wang and Chengqi Xue
Appl. Sci. 2021, 11(20), 9671; https://doi.org/10.3390/app11209671 - 17 Oct 2021
Viewed by 2799
Abstract
The information cluster that supports the final decision in a decision task is usually presented as a series of information. According to the serial position effect, the decision result is easily affected by the presentation order of the information. In this study, we [...] Read more.
The information cluster that supports the final decision in a decision task is usually presented as a series of information. According to the serial position effect, the decision result is easily affected by the presentation order of the information. In this study, we seek to investigate how the presentation mode of commodities and the informativeness on a shopping website will influence online shopping decisions. To this end, we constructed two experiments via a virtual online shopping environment. The first experiment suggests that the serial position effect can induce human computer interaction decision-making bias, and user decision-making results in separate evaluation mode are more prone to the recency effect, whereas user decision-making results in joint evaluation mode are more prone to the primacy effect. The second experiment confirms the influence of explicit and implicit details of information on the decision bias of the human computer interaction caused by the serial position effect. The results of the research will be better applied to the design and development of shopping websites or further applied to the interactive design of complex information systems to alleviate user decision-making biases and induce users to make more rational decisions. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

21 pages, 459 KiB  
Review
A Review on Machine and Deep Learning for Semiconductor Defect Classification in Scanning Electron Microscope Images
by Francisco López de la Rosa, Roberto Sánchez-Reolid, José L. Gómez-Sirvent, Rafael Morales and Antonio Fernández-Caballero
Appl. Sci. 2021, 11(20), 9508; https://doi.org/10.3390/app11209508 - 13 Oct 2021
Cited by 30 | Viewed by 8705
Abstract
Continued advances in machine learning (ML) and deep learning (DL) present new opportunities for use in a wide range of applications. One prominent application of these technologies is defect detection and classification in the manufacturing industry in order to minimise costs and ensure [...] Read more.
Continued advances in machine learning (ML) and deep learning (DL) present new opportunities for use in a wide range of applications. One prominent application of these technologies is defect detection and classification in the manufacturing industry in order to minimise costs and ensure customer satisfaction. Specifically, this scoping review focuses on inspection operations in the semiconductor manufacturing industry where different ML and DL techniques and configurations have been used for defect detection and classification. Inspection operations have traditionally been carried out by specialised personnel in charge of visually judging the images obtained with a scanning electron microscope (SEM). This scoping review focuses on inspection operations in the semiconductor manufacturing industry where different ML and DL methods have been used to detect and classify defects in SEM images. We also include the performance results of the different techniques and configurations described in the articles found. A thorough comparison of these results will help us to find the best solutions for future research related to the subject. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 42126 KiB  
Article
Bag of Features (BoF) Based Deep Learning Framework for Bleached Corals Detection
by Sonain Jamil, MuhibUr Rahman and Amir Haider
Big Data Cogn. Comput. 2021, 5(4), 53; https://doi.org/10.3390/bdcc5040053 - 8 Oct 2021
Cited by 21 | Viewed by 6514
Abstract
Coral reefs are the sub-aqueous calcium carbonate structures collected by the invertebrates known as corals. The charm and beauty of coral reefs attract tourists, and they play a vital role in preserving biodiversity, ceasing coastal erosion, and promoting business trade. However, they are [...] Read more.
Coral reefs are the sub-aqueous calcium carbonate structures collected by the invertebrates known as corals. The charm and beauty of coral reefs attract tourists, and they play a vital role in preserving biodiversity, ceasing coastal erosion, and promoting business trade. However, they are declining because of over-exploitation, damaging fishery, marine pollution, and global climate changes. Also, coral reefs help treat human immune-deficiency virus (HIV), heart disease, and coastal erosion. The corals of Australia’s great barrier reef have started bleaching due to the ocean acidification, and global warming, which is an alarming threat to the earth’s ecosystem. Many techniques have been developed to address such issues. However, each method has a limitation due to the low resolution of images, diverse weather conditions, etc. In this paper, we propose a bag of features (BoF) based approach that can detect and localize the bleached corals before the safety measures are applied. The dataset contains images of bleached and unbleached corals, and various kernels are used to support the vector machine so that extracted features can be classified. The accuracy of handcrafted descriptors and deep convolutional neural networks is analyzed and provided in detail with comparison to the current method. Various handcrafted descriptors like local binary pattern, a histogram of an oriented gradient, locally encoded transform feature histogram, gray level co-occurrence matrix, and completed joint scale local binary pattern are used for feature extraction. Specific deep convolutional neural networks such as AlexNet, GoogLeNet, VGG-19, ResNet-50, Inception v3, and CoralNet are being used for feature extraction. From experimental analysis and results, the proposed technique outperforms in comparison to the current state-of-the-art methods. The proposed technique achieves 99.08% accuracy with a classification error of 0.92%. A novel bleached coral positioning algorithm is also proposed to locate bleached corals in the coral reef images. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 1794 KiB  
Article
A Shallow–Deep Feature Fusion Method for Pedestrian Detection
by Daxue Liu, Kai Zang and Jifeng Shen
Appl. Sci. 2021, 11(19), 9202; https://doi.org/10.3390/app11199202 - 3 Oct 2021
Cited by 1 | Viewed by 1841
Abstract
In this paper, a shallow–deep feature fusion (SDFF) method is developed for pedestrian detection. Firstly, we propose a shallow feature-based method under the ACF framework of pedestrian detection. More precisely, improved Haar-like templates with Local FDA learning are used to filter the channel [...] Read more.
In this paper, a shallow–deep feature fusion (SDFF) method is developed for pedestrian detection. Firstly, we propose a shallow feature-based method under the ACF framework of pedestrian detection. More precisely, improved Haar-like templates with Local FDA learning are used to filter the channel maps of ACF such that these Haar-like features are able to improve the discriminative power and therefore enhance the detection performance. The proposed shallow feature is also referred to as weighted subset-haar-like feature. It is efficient in pedestrian detection with a high recall rate and precise localization. Secondly, the proposed shallow feature-based detection method operates as a region proposal. A classifier equipped with ResNet is then used to refine the region proposals to judge whether each region contains a pedestrian or not. The extensive experiments evaluated on INRIA, Caltech, and TUD-Brussel datasets show that SDFF is an effective and efficient method for pedestrian detection. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 11112 KiB  
Article
Recovery of Natural Scenery Image by Content Using Wiener-Granger Causality: A Self-Organizing Methodology
by Cesar Benavides-Alvarez, Carlos Aviles-Cruz, Eduardo Rodriguez-Martinez, Andrés Ferreyra-Ramírez and Arturo Zúñiga-López
Appl. Sci. 2021, 11(19), 8795; https://doi.org/10.3390/app11198795 - 22 Sep 2021
Viewed by 2130
Abstract
One of the most important applications of data science and data mining is is organizing, classifying, and retrieving digital images on Internet. The current focus of the researchers is to develop methods for the content based exploration of natural scenery images. In this [...] Read more.
One of the most important applications of data science and data mining is is organizing, classifying, and retrieving digital images on Internet. The current focus of the researchers is to develop methods for the content based exploration of natural scenery images. In this research paper, a self-organizing method of natural scenes images using Wiener-Granger Causality theory is proposed. It is achieved by carrying out Wiener-Granger causality for organizing the features in the time series form and introducing a characteristics extraction stage at random points within the image. Once the causal relationships are obtained, the k-means algorithm is applied to achieve the self-organizing of these attributes. Regarding classification, the kNN distance classification algorithm is used to find the most similar images that share the causal relationships between the elements of the scenes. The proposed methodology is validated on three public image databases, obtaining 100% recovery results. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 4515 KiB  
Article
An Attention Enhanced Spatial–Temporal Graph Convolutional LSTM Network for Action Recognition in Karate
by Jianping Guo, Hong Liu, Xi Li, Dahong Xu and Yihan Zhang
Appl. Sci. 2021, 11(18), 8641; https://doi.org/10.3390/app11188641 - 17 Sep 2021
Cited by 19 | Viewed by 2438
Abstract
With the increasing popularity of artificial intelligence applications, artificial intelligence technology has begun to be applied in competitive sports. These applications have promoted the improvement of athletes’ competitive ability, as well as the fitness of the masses. Human action recognition technology, based on [...] Read more.
With the increasing popularity of artificial intelligence applications, artificial intelligence technology has begun to be applied in competitive sports. These applications have promoted the improvement of athletes’ competitive ability, as well as the fitness of the masses. Human action recognition technology, based on deep learning, has gradually been applied to the analysis of the technical actions of competitive sports athletes, as well as the analysis of tactics. In this paper, a new graph convolution model is proposed. Delaunay’s partitioning algorithm was used to construct a new spatiotemporal topology which can effectively obtain the structural information and spatiotemporal features of athletes’ technical actions. At the same time, the attention mechanism was integrated into the model, and different weight coefficients were assigned to the joints, which significantly improved the accuracy of technical action recognition. First, a comparison between the current state-of-the-art methods was undertaken using the general datasets of Kinect and NTU-RGB + D. The performance of the new algorithm model was slightly improved in comparison to the general dataset. Then, the performance of our algorithm was compared with spatial temporal graph convolutional networks (ST-GCN) for the karate technique action dataset. We found that the accuracy of our algorithm was significantly improved. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 11346 KiB  
Article
Spatiotemporal Correlation-Based Accurate 3D Face Imaging Using Speckle Projection and Real-Time Improvement
by Wei Xiong, Hongyu Yang, Pei Zhou, Keren Fu and Jiangping Zhu
Appl. Sci. 2021, 11(18), 8588; https://doi.org/10.3390/app11188588 - 16 Sep 2021
Cited by 7 | Viewed by 3251
Abstract
The reconstruction of 3D face data is widely used in the fields of biometric recognition and virtual reality. However, the rapid acquisition of 3D data is plagued by reconstruction accuracy, slow speed, excessive scenes and contemporary reconstruction-technology. To solve this problem, an accurate [...] Read more.
The reconstruction of 3D face data is widely used in the fields of biometric recognition and virtual reality. However, the rapid acquisition of 3D data is plagued by reconstruction accuracy, slow speed, excessive scenes and contemporary reconstruction-technology. To solve this problem, an accurate 3D face-imaging implementation framework based on coarse-to-fine spatiotemporal correlation is designed, improving the spatiotemporal correlation stereo matching process and accelerating the processing using a spatiotemporal box filter. The reliability of the reconstruction parameters is further verified in order to resolve the contention between the measurement accuracy and time cost. A binocular 3D data acquisition device with a rotary speckle projector is used to continuously and synchronously acquire an infrared speckle stereo image sequence for reconstructing an accurate 3D face model. Based on the face mask data obtained by the high-precision industrial 3D scanner, the relationship between the number of projected speckle patterns, the matching window size, the reconstruction accuracy and the time cost is quantitatively analysed. An optimal combination of parameters is used to achieve a balance between reconstruction speed and accuracy. Thus, to overcome the problem of a long acquisition time caused by the switching of the rotary speckle pattern, a compact 3D face acquisition device using a fixed three-speckle projector is designed. Using the optimal combination parameters of the three speckles, the parallel pipeline strategy is adopted in each core processing unit to maximise system resource utilisation and data throughput. The most time-consuming spatiotemporal correlation stereo matching activity was accelerated by the graphical processing unit. The results show that the system achieves real-time image acquisition, as well as 3D face reconstruction, while maintaining acceptable systematic precision. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 4012 KiB  
Article
Image Splicing Location Based on Illumination Maps and Cluster Region Proposal Network
by Ye Zhu, Xiaoqian Shen, Shikun Liu, Xiaoli Zhang and Gang Yan
Appl. Sci. 2021, 11(18), 8437; https://doi.org/10.3390/app11188437 - 11 Sep 2021
Cited by 3 | Viewed by 2123
Abstract
Splicing is the most common operation in image forgery, where the tampered background regions are imported from different images. Illumination maps are inherent attribute of images and provide significant clues when searching for splicing locations. This paper proposes an end-to-end dual-stream network for [...] Read more.
Splicing is the most common operation in image forgery, where the tampered background regions are imported from different images. Illumination maps are inherent attribute of images and provide significant clues when searching for splicing locations. This paper proposes an end-to-end dual-stream network for splicing location, where the illumination stream, which includes Grey-Edge (GE) and Inverse-Intensity Chromaticity (IIC), extract the inconsistent features, and the image stream extracts the global unnatural tampered features. The dual-stream feature in our network is fused through Multiple Feature Pyramid Network (MFPN), which contains richer context information. Finally, a Cluster Region Proposal Network (C-RPN) with spatial attention and an adaptive cluster anchor are proposed to generate potential tampered regions with greater retention of location information. Extensive experiments, which were evaluated on the NIST16 and CASIA standard datasets, show that our proposed algorithm is superior to some state-of-the-art algorithms, because it achieves accurate tampered locations at the pixel level, and has great robustness in post-processing operations, such as noise, blur and JPEG recompression. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 2288 KiB  
Article
Learning Spatial–Temporal Background-Aware Based Tracking
by Peiting Gu, Peizhong Liu, Jianhua Deng and Zhi Chen
Appl. Sci. 2021, 11(18), 8427; https://doi.org/10.3390/app11188427 - 10 Sep 2021
Cited by 2 | Viewed by 1620
Abstract
Discriminative correlation filter (DCF) based tracking algorithms have obtained prominent speed and accuracy strengths, which have attracted extensive attention and research. However, some unavoidable deficiencies still exist. For example, the circulant shifted sampling process is likely to cause repeated periodic assumptions and cause [...] Read more.
Discriminative correlation filter (DCF) based tracking algorithms have obtained prominent speed and accuracy strengths, which have attracted extensive attention and research. However, some unavoidable deficiencies still exist. For example, the circulant shifted sampling process is likely to cause repeated periodic assumptions and cause boundary effects, which degrades the tracker’s discriminative performance, and the target is not easy to locate in complex appearance changes. In this paper, a spatial–temporal regularization module based on BACF (background-aware correlation filter) framework is proposed, which is performed by introducing a temporal regularization to deal effectively with the boundary effects issue. At the same time, the accuracy of target recognition is improved. This model can be effectively optimized by employing the alternating direction multiplier (ADMM) method, and each sub-problem has a corresponding closed solution. In addition, in terms of feature representation, we combine traditional hand-crafted features with deep convolution features linearly enhance the discriminative performance of the filter. Considerable experiments on multiple well-known benchmarks show the proposed algorithm is performs favorably against many state-of-the-art trackers and achieves an AUC score of 64.4% on OTB-100. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 4120 KiB  
Article
Common Gabor Features for Image Watermarking Identification
by Ismail Taha Ahmed, Baraa Tareq Hammad and Norziana Jamil
Appl. Sci. 2021, 11(18), 8308; https://doi.org/10.3390/app11188308 - 8 Sep 2021
Cited by 15 | Viewed by 2346
Abstract
Image watermarking is one of many methods for preventing unauthorized alterations to digital images. The major goal of the research is to find and identify photos that include a watermark, regardless of the method used to add the watermark or the shape of [...] Read more.
Image watermarking is one of many methods for preventing unauthorized alterations to digital images. The major goal of the research is to find and identify photos that include a watermark, regardless of the method used to add the watermark or the shape of the watermark. As a result, this study advocated using the best Gabor features and classifiers to improve the accuracy of image watermarking identification. As classifiers, discriminant analysis (DA) and random forests are used. The DA and random forest use mean squared energy feature, mean amplitude feature, and combined feature vector as inputs for classification. The performance of the classifiers is evaluated using a variety of feature sets, and the best results are achieved. In order to assess the performance of the proposed method, we use a public database. VOC2008 is a public database that we use. The findings reveal that our proposed method’s DA classifier with integrated features had the greatest TPR of 93.71 and the lowest FNR of 6.29. This shows that the performance outcomes of the proposed approach are consistent. The proposed method has the advantages of being able to find images with the watermark in any database and not requiring a specific type or algorithm for embedding the watermark. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

17 pages, 2715 KiB  
Article
Automatic System for the Detection of Defects on Olive Fruits in an Oil Mill
by Pablo Cano Marchal, Silvia Satorres Martínez, Juan Gómez Ortega and Javier Gámez García
Appl. Sci. 2021, 11(17), 8167; https://doi.org/10.3390/app11178167 - 3 Sep 2021
Cited by 3 | Viewed by 2109
Abstract
The ripeness and sanitary state of olive fruits are key factors in the final quality of the virgin olive oil (VOO) obtained. Since even a small number of damaged fruits may significantly impact the final quality of the produced VOO, the olive inspection [...] Read more.
The ripeness and sanitary state of olive fruits are key factors in the final quality of the virgin olive oil (VOO) obtained. Since even a small number of damaged fruits may significantly impact the final quality of the produced VOO, the olive inspection in the oil mill reception area or in the first stages of the productive process is of great interest. This paper proposes and validates an automatic defect detection system that utilizes infrared images, acquired under regular operating conditions of an olive oil mill, for the detection of defects on individual fruits. First, the image processing algorithm extracts the fruits based on the iterative application of the active contour technique assisted with mathematical morphology operations. Second, the defect detection is performed on the segmented olives using a decision tree based on region descriptors. The final assessment of the algorithm suggests that it works effectively with a high detection rate, which makes it suitable for the VOO industry. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1