Computer Vision and Image Processing

Abstract submission deadline
closed (31 March 2023)
Manuscript submission deadline
30 June 2023
Viewed by
74818

Topic Information

Dear Colleagues,

Computer vision is a scientific discipline that aims at developing models for understanding our 3D environment using cameras. Further, image processing can be understood as the whole body of techniques that extract useful information directly from images or to process them for optimal subsequent analysis. At any rate, computer vision and image processing are two closely related fields which can be considered as a work area used in almost any research involving cameras or any image sensor to acquire information from the scenes or working environments. Thus, the main aim of this Topic is to cover some of the relevant areas where computer vision/image processing is applied, including but not limited to:

  • Three-dimensional image acquisition, processing, and visualization
  • Scene understanding
  • Greyscale, color, and multispectral image processing
  • Multimodal sensor fusion
  • Industrial inspection
  • Robotics
  • Surveillance
  • Airborne and satellite on-board image acquisition platforms.
  • Computational models of vision
  • Imaging psychophysics
  • Etc.

Prof. Dr. Silvia Liberata Ullo
Topic Editor

Keywords

  • 3D acquisition, processing, and visualization
  • scene understanding
  • multimodal sensor processing and fusion
  • multispectral, color, and greyscale image processing
  • industrial quality inspection
  • computer vision for robotics
  • computer vision for surveillance
  • airborne and satellite on-board image acquisition platforms
  • computational models of vision
  • imaging psychophysics

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.838 4.5 2011 14.9 Days 2300 CHF Submit
Electronics
electronics
2.690 4.7 2012 14.4 Days 2000 CHF Submit
Modelling
modelling
- - 2020 21.5 Days 1000 CHF Submit
Network
network
- - 2021 24.8 Days 1000 CHF Submit
Journal of Imaging
jimaging
- 4.4 2015 21.2 Days 1600 CHF Submit

Preprints is a platform dedicated to making early versions of research outputs permanently available and citable. MDPI journals allow posting on preprint servers such as Preprints.org prior to publication. For more details about reprints, please visit https://www.preprints.org.

Published Papers (71 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
Article
SimoSet: A 3D Object Detection Dataset Collected from Vehicle Hybrid Solid-State LiDAR
Electronics 2023, 12(11), 2424; https://doi.org/10.3390/electronics12112424 - 26 May 2023
Viewed by 215
Abstract
Three-dimensional (3D) object detection based on point cloud data plays a critical role in the perception system of autonomous driving. However, this task presents a significant challenge in terms of its practical implementation due to the absence of point cloud data from automotive-grade [...] Read more.
Three-dimensional (3D) object detection based on point cloud data plays a critical role in the perception system of autonomous driving. However, this task presents a significant challenge in terms of its practical implementation due to the absence of point cloud data from automotive-grade hybrid solid-state LiDAR, as well as the limitations regarding the generalization ability of data-driven deep learning methods. In this paper, we introduce SimoSet, the first vehicle view 3D object detection dataset composed of automotive-grade hybrid solid-state LiDAR data. The dataset was collected from a university campus, contains 52 scenes, each of which are 8 s long, and provides three types of labels for typical traffic participants. We analyze the impact of the installation height and angle of the LiDAR on scanning effect and provide a reference process for the collection, annotation, and format conversion of LiDAR data. Finally, we provide baselines for LiDAR-only 3D object detection. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Combining CNNs and Markov-like Models for Facial Landmark Detection with Spatial Consistency Estimates
J. Imaging 2023, 9(5), 104; https://doi.org/10.3390/jimaging9050104 - 22 May 2023
Viewed by 454
Abstract
The accurate localization of facial landmarks is essential for several tasks, including face recognition, head pose estimation, facial region extraction, and emotion detection. Although the number of required landmarks is task-specific, models are typically trained on all available landmarks in the datasets, limiting [...] Read more.
The accurate localization of facial landmarks is essential for several tasks, including face recognition, head pose estimation, facial region extraction, and emotion detection. Although the number of required landmarks is task-specific, models are typically trained on all available landmarks in the datasets, limiting efficiency. Furthermore, model performance is strongly influenced by scale-dependent local appearance information around landmarks and the global shape information generated by them. To account for this, we propose a lightweight hybrid model for facial landmark detection designed specifically for pupil region extraction. Our design combines a convolutional neural network (CNN) with a Markov random field (MRF)-like process trained on only 17 carefully selected landmarks. The advantage of our model is the ability to run different image scales on the same convolutional layers, resulting in a significant reduction in model size. In addition, we employ an approximation of the MRF that is run on a subset of landmarks to validate the spatial consistency of the generated shape. This validation process is performed against a learned conditional distribution, expressing the location of one landmark relative to its neighbor. Experimental results on popular facial landmark localization datasets such as 300 w, WFLW, and HELEN demonstrate the accuracy of our proposed model. Furthermore, our model achieves state-of-the-art performance on a well-defined robustness metric. In conclusion, the results demonstrate the ability of our lightweight model to filter out spatially inconsistent predictions, even with significantly fewer training landmarks. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Hierarchical Clustering Obstacle Detection Method Applied to RGB-D Cameras
Electronics 2023, 12(10), 2316; https://doi.org/10.3390/electronics12102316 - 21 May 2023
Viewed by 353
Abstract
Environment perception is a key part of robot self-controlled motion. When using vision to accomplish obstacle detection tasks, it is difficult for deep learning methods to detect all obstacles due to complex environment and vision limitations, and it is difficult for traditional methods [...] Read more.
Environment perception is a key part of robot self-controlled motion. When using vision to accomplish obstacle detection tasks, it is difficult for deep learning methods to detect all obstacles due to complex environment and vision limitations, and it is difficult for traditional methods to meet real-time requirements when applied to embedded platforms. In this paper, a fast obstacle-detection process applied to RGB-D cameras is proposed. The process has three main steps, feature point extraction, noise removal, and obstacle clustering. Using Canny and Shi–Tomasi algorithms to complete the pre-processing and feature point extraction, filtering noise based on geometry, grouping obstacles with different depths based on the basic principle that the feature points on the same object contour must be continuous or within the same depth in the view of RGB-D camera, and then doing further segmentation from the horizontal direction to complete the obstacle clustering work. The method omits the iterative computation process required by traditional methods and greatly reduces the memory and time overhead. After experimental verification, the proposed method has a comprehensive recognition accuracy of 82.41%, which is 4.13% and 19.34% higher than that of RSC and traditional methods, respectively, and recognition accuracy of 91.72% under normal illumination, with a recognition speed of more than 20 FPS on the embedded platform; at the same time, all detections can be achieved within 1 m under normal illumination, and the detection error is no more than 2 cm within 3 m. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
An Infrared and Visible Image Fusion Algorithm Method Based on a Dual Bilateral Least Squares Hybrid Filter
Electronics 2023, 12(10), 2292; https://doi.org/10.3390/electronics12102292 - 18 May 2023
Viewed by 368
Abstract
Infrared and visible images of the same scene are fused to produce a fused image with richer information. However, most current image-fusion algorithms suffer from insufficient edge information retention, weak feature representation, and poor contrast, halos, and artifacts, and can only be applied [...] Read more.
Infrared and visible images of the same scene are fused to produce a fused image with richer information. However, most current image-fusion algorithms suffer from insufficient edge information retention, weak feature representation, and poor contrast, halos, and artifacts, and can only be applied to a single scene. To address these issues, we propose a novel infrared and visual image fusion algorithm based on a bilateral–least-squares hybrid filter (DBLSF) with the least-squares and bilateral filter hybrid model (BLF-LS). The proposed algorithm utilizes the residual network ResNet50 and the adaptive fusion strategy of the structure tensor to fuse the base and detail layers of the filter decomposition, respectively. Experiments on 32 sets of images from the TNO image-fusion dataset show that, although our fusion algorithm sacrifices overall time efficiency, the Combination 1 approach can better preserve image edge information and image integrity; reduce the loss of source image features; suppress artifacts and halos; and compare favorably with other algorithms in terms of structural similarity, feature similarity, multiscale structural similarity, root mean square error, peak signal-to-noise ratio, and correlation coefficient by at least 2.71%, 1.86%, 0.09%, 0.46%, 0.24%, and 0.07%; and the proposed Combination 2 can effectively improve the contrast and edge features of the fused image and enrich the image detail information, with an average improvement of 37.42%, 26.40%, and 26.60% in the three metrics of average gradient, edge intensity, and spatial frequency compared with other algorithms. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
An Efficient Strategy for Catastrophic Forgetting Reduction in Incremental Learning
Electronics 2023, 12(10), 2265; https://doi.org/10.3390/electronics12102265 - 17 May 2023
Viewed by 510
Abstract
Deep neural networks (DNNs) have made outstanding achievements in a wide variety of domains. For deep learning tasks, large enough datasets are required for training efficient DNN models. However, big datasets are not always available, and they are costly to build. Therefore, balanced [...] Read more.
Deep neural networks (DNNs) have made outstanding achievements in a wide variety of domains. For deep learning tasks, large enough datasets are required for training efficient DNN models. However, big datasets are not always available, and they are costly to build. Therefore, balanced solutions for DNN model efficiency and training data size have caught the attention of researchers recently. Transfer learning techniques are the most common for this. In transfer learning, a DNN model is pre-trained on a large enough dataset and then applied to a new task with modest data. This fine-tuning process yields another challenge, named catastrophic forgetting. However, it can be reduced using a reasonable strategy for data argumentation in incremental learning. In this paper, we propose an efficient solution for the random selection of samples from the old task to be incrementally stored for learning a sequence of new tasks. In addition, a loss combination strategy is also proposed for optimizing incremental learning. The proposed solutions are evaluated on standard datasets with two scenarios of incremental fine-tuning: (1) New Class (NC) dataset; (2) New Class and new Instance (NCI) dataset. The experimental results show that our proposed solution achieves outstanding results compared with other SOTA rehearsal methods, as well as traditional fine-tuning solutions, ranging from 1% to 16% in recognition accuracy. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Big-Volume SliceGAN for Improving a Synthetic 3D Microstructure Image of Additive-Manufactured TYPE 316L Steel
J. Imaging 2023, 9(5), 90; https://doi.org/10.3390/jimaging9050090 - 29 Apr 2023
Viewed by 611
Abstract
A modified SliceGAN architecture was proposed to generate a high-quality synthetic three-dimensional (3D) microstructure image of TYPE 316L material manufactured through additive methods. The quality of the resulting 3D image was evaluated using an auto-correlation function, and it was discovered that maintaining a [...] Read more.
A modified SliceGAN architecture was proposed to generate a high-quality synthetic three-dimensional (3D) microstructure image of TYPE 316L material manufactured through additive methods. The quality of the resulting 3D image was evaluated using an auto-correlation function, and it was discovered that maintaining a high resolution while doubling the training image size was crucial in creating a more realistic synthetic 3D image. To meet this requirement, modified 3D image generator and critic architecture was developed within the SliceGAN framework. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Overcoming Adverse Conditions in Rescue Scenarios: A Deep Learning and Image Processing Approach
Appl. Sci. 2023, 13(9), 5499; https://doi.org/10.3390/app13095499 - 28 Apr 2023
Viewed by 614
Abstract
This paper presents a Deep Learning (DL) and Image-Processing (IP) pipeline that addresses exposure recovery in challenging lighting conditions for enhancing First Responders’ (FRs) Situational Awareness (SA) during rescue operations. The method aims to improve the quality of images captured by FRs, particularly [...] Read more.
This paper presents a Deep Learning (DL) and Image-Processing (IP) pipeline that addresses exposure recovery in challenging lighting conditions for enhancing First Responders’ (FRs) Situational Awareness (SA) during rescue operations. The method aims to improve the quality of images captured by FRs, particularly in overexposed and underexposed environments while providing a response time suitable for rescue scenarios. The paper describes the technical details of the pipeline, including exposure correction, segmentation, and fusion techniques. Our results demonstrate that the pipeline effectively recovers details in challenging lighting conditions, improves object detection, and is efficient in high-stress, fast-paced rescue situations. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Detection of Targets in Road Scene Images Enhanced Using Conditional GAN-Based Dehazing Model
Appl. Sci. 2023, 13(9), 5326; https://doi.org/10.3390/app13095326 - 24 Apr 2023
Viewed by 660
Abstract
Object detection is a classic image processing problem. For instance, in autonomous driving applications, targets such as cars and pedestrians are detected in the road scene video. Many image-based object detection methods utilizing hand-crafted features have been proposed. Recently, more research has adopted [...] Read more.
Object detection is a classic image processing problem. For instance, in autonomous driving applications, targets such as cars and pedestrians are detected in the road scene video. Many image-based object detection methods utilizing hand-crafted features have been proposed. Recently, more research has adopted a deep learning approach. Object detectors rely on useful features, such as the object’s boundary, which are extracted via analyzing the image pixels. However, the images captured, for instance, in an outdoor environment, may be degraded due to bad weather such as haze and fog. One possible remedy is to recover the image radiance through the use of a pre-processing method such as image dehazing. We propose a dehazing model for image enhancement. The framework was based on the conditional generative adversarial network (cGAN). Our proposed model was improved with two modifications. Various image dehazing datasets were employed for comparative analysis. Our proposed model outperformed other hand-crafted and deep learning-based image dehazing methods by 2dB or more in PSNR. Moreover, we utilized the dehazed images for target detection using the object detector YOLO. In the experimentations, images were degraded by two weather conditions—rain and fog. We demonstrated that the objects detected in images enhanced by our proposed dehazing model were significantly improved over those detected in the degraded images. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Unsupervised Image Enhancement Method Based on Attention Map Network Guidance and Attention Mechanism
Electronics 2023, 12(8), 1887; https://doi.org/10.3390/electronics12081887 - 17 Apr 2023
Viewed by 594
Abstract
Low-light image enhancement is a crucial preprocessing task in complex vision tasks. It directly impacts object detection, image segmentation, and image recognition outcomes. In recent years, with the continuous development of deep learning techniques, an increasing number of image enhancement methods based on [...] Read more.
Low-light image enhancement is a crucial preprocessing task in complex vision tasks. It directly impacts object detection, image segmentation, and image recognition outcomes. In recent years, with the continuous development of deep learning techniques, an increasing number of image enhancement methods based on deep learning have emerged. However, due to the high cost of data collection and the limited content of supervised learning datasets, more and more scholars have shifted their focus to the field of unsupervised image enhancement. Unsupervised image enhancement methods do not require paired images of the same scene during the training process, which greatly reduces the threshold for network training. Nevertheless, current unsupervised methods still suffer from issues such as unstable enhancement effects and limited generalization ability. To address these problems, we propose an improved low-light image enhancement method. The proposed method employs the LSGAN as the training architecture and utilizes an attention map network to dynamically generate attention maps that best fit the network enhancement task, which can effectively improve the generalization ability and enhancement performance of the network. Additionally, we adopt an attention mechanism to enhance the subtle details of the image features. Regarding the network training, considering that the traditional convolutional neural network discriminator may not provide effective guidance to the generator in the early stages of training, we propose an improved discriminator structure. The experimental results demonstrate that our method can achieve good enhancement performance on different datasets and has practical value. Although our method has advantages in enhancing low-light images, it also has certain limitations, such as the network size not meeting the requirements for lightweight models and the potential for further improvement under extremely low-light conditions. We will strive to address these issues as comprehensively as possible in our future research. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Research on Identification and Location of Charging Ports of Multiple Electric Vehicles Based on SFLDLC-CBAM-YOLOV7-Tinp-CTMA
Electronics 2023, 12(8), 1855; https://doi.org/10.3390/electronics12081855 - 14 Apr 2023
Viewed by 673
Abstract
With the gradual maturity of autonomous driving and automatic parking technology, electric vehicle charging is moving towards automation. The charging port (CP) location is an important basis for realizing automatic charging. Existing CP identification algorithms are only suitable for a single vehicle model [...] Read more.
With the gradual maturity of autonomous driving and automatic parking technology, electric vehicle charging is moving towards automation. The charging port (CP) location is an important basis for realizing automatic charging. Existing CP identification algorithms are only suitable for a single vehicle model with poor universality. Therefore, this paper proposes a set of methods that can identify the CPs of various vehicle types. The recognition process is divided into a rough positioning stage (RPS) and a precise positioning stage (PPS). In this study, the data sets corresponding to four types of vehicle CPs under different environments are established. In the RPS, the characteristic information of the CP is obtained based on the combination of convolutional block attention module (CBAM) and YOLOV7-tinp, and its position information is calculated using the similar projection relationship. For the PPS, this paper proposes a data enhancement method based on similar feature location to determine the label category (SFLDLC). The CBAM-YOLOV7-tinp is used to identify the feature location information, and the cluster template matching algorithm (CTMA) is used to obtain the accurate feature location and tag type, and the EPnP algorithm is used to calculate the location and posture (LP) information. The results of the LP solution are used to provide the position coordinates of the CP relative to the robot base. Finally, the AUBO-i10 robot is used to complete the experimental test. The corresponding results show that the average positioning errors (x, y, z, rx, ry, and rz) of the CP are 0.64 mm, 0.88 mm, 1.24 mm, 1.19 degrees, 1.00 degrees, and 0.57 degrees, respectively, and the integrated insertion success rate is 94.25%. Therefore, the algorithm proposed in this paper can efficiently and accurately identify and locate various types of CP and meet the actual plugging requirements. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Thangka Sketch Colorization Based on Multi-Level Adaptive-Instance-Normalized Color Fusion and Skip Connection Attention
Electronics 2023, 12(7), 1745; https://doi.org/10.3390/electronics12071745 - 06 Apr 2023
Viewed by 612
Abstract
Thangka is an important intangible cultural heritage of Tibet. Due to the complexity, and time-consuming nature of the Thangka painting technique, this technique is currently facing the risk of being lost. It is important to preserve the art of Thangka through digital painting [...] Read more.
Thangka is an important intangible cultural heritage of Tibet. Due to the complexity, and time-consuming nature of the Thangka painting technique, this technique is currently facing the risk of being lost. It is important to preserve the art of Thangka through digital painting methods. Machine learning-based auto-sketch colorization is one of the vital steps for digital Thangka painting. However, existing learning-based sketch colorization methods face two challenges in solving the problem of colorizing Thangka: (1) the extremely rich colors of the Thangka make it difficult to color accurately with existing algorithms, and (2) the line density of the Thangka brings extreme challenges for algorithms to define what semantic information the lines imply. To resolve these problems, we propose a Thangka sketch colorization method based on multi-level adaptive-instance-normalized color fusion (MACF) and skip connection attention (SCA). The proposed method consists of two parts: (1) a multi-level adaptive-instance-normalized color fusion (MACF) to fuse sketch feature and color feature; and (2) a skip connection attention (SCA) mechanism to distinguish the semantic information implied by the sketch lines. Experiments on colorizing Thangka sketches show that our method works well on two small datasets—the Danbooru 2019 dataset and the Thangka dataset. Our approach can generate exquisite Thangka. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Research on Improved Multi-Channel Image Stitching Technology Based on Fast Algorithms
Electronics 2023, 12(7), 1700; https://doi.org/10.3390/electronics12071700 - 03 Apr 2023
Viewed by 763
Abstract
The image registration and fusion process of image stitching algorithms entails significant computational costs, and the use of robust stitching algorithms with good performance is limited in real-time applications on PCs (personal computers) and embedded systems. Fast image registration and fusion algorithms suffer [...] Read more.
The image registration and fusion process of image stitching algorithms entails significant computational costs, and the use of robust stitching algorithms with good performance is limited in real-time applications on PCs (personal computers) and embedded systems. Fast image registration and fusion algorithms suffer from problems such as ghosting and dashed lines, resulting in suboptimal display effects on the stitching. Consequently, this study proposes a multi-channel image stitching approach based on fast image registration and fusion algorithms, which enhances the stitching effect on the basis of fast algorithms, thereby augmenting its potential for deployment in real-time applications. First, in the image registration stage, the gridded Binary Robust Invariant Scalable Keypoints (BRISK) method was used to improve the matching efficiency of feature points, and the Grid-based Motion Statistics (GMS) algorithm with a bidirectional rough matching method was used to improve the matching accuracy of feature points. Then, the optimal seam algorithm was used in the image fusion stage to obtain the seam line and construct the fusion area. The seam and transition areas were fused using the fade-in and fade-out weighting algorithm to obtain smooth and high-quality stitched images. The experimental results demonstrate the performance of our proposed method through an improvement in image registration and fusion metrics. We compared our approach with both the original algorithm and other existing methods and achieved significant improvements in eliminating stitching artifacts such as ghosting and discontinuities while maintaining the efficiency of fast algorithms. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Self-Supervised Tree-Structured Framework for Fine-Grained Classification
Appl. Sci. 2023, 13(7), 4453; https://doi.org/10.3390/app13074453 - 31 Mar 2023
Viewed by 508
Abstract
In computer vision, fine-grained classification has become an important issue in recognizing objects with slight visual differences. Usually, it is challenging to generate good performance when solving fine-grained classification problems using traditional convolutional neural networks. To improve the accuracy and training time of [...] Read more.
In computer vision, fine-grained classification has become an important issue in recognizing objects with slight visual differences. Usually, it is challenging to generate good performance when solving fine-grained classification problems using traditional convolutional neural networks. To improve the accuracy and training time of convolutional neural networks in solving fine-grained classification problems, this paper proposes a tree-structured framework by eliminating the effect of differences between clusters. The contributions of the proposed method include the following three aspects: (1) a self-supervised method that automatically creates a classification tree, eliminating the need for manual labeling; (2) a machine-learning matcher which determines the cluster to which an item belongs, minimizing the impact of inter-cluster variations on classification; and (3) a pruning criterion which filters the tree-structured classifier, retaining only the models with superior classification performance. The experimental evaluation of the proposed tree-structured framework demonstrates its effectiveness in reducing training time and improving the accuracy of fine-grained classification across various datasets in comparison with conventional convolutional neural network models. Specifically, for the CUB 200 2011, FGVC aircraft, and Stanford car datasets, the proposed method achieves a reduction in training time of 32.91%, 35.87%, and 14.48%, and improves the accuracy of fine-grained classification by 1.17%, 2.01%, and 0.59%, respectively. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Pixel-Coordinate-Induced Human Pose High-Precision Estimation Method
Electronics 2023, 12(7), 1648; https://doi.org/10.3390/electronics12071648 - 31 Mar 2023
Viewed by 647
Abstract
Accurately estimating human pose is crucial for providing feedback during exercises or musical performances, but the complex and flexible nature of human joints makes it challenging. Additionally, traditional methods often neglect pixel coordinates, which are naturally present in high-resolution images of the human [...] Read more.
Accurately estimating human pose is crucial for providing feedback during exercises or musical performances, but the complex and flexible nature of human joints makes it challenging. Additionally, traditional methods often neglect pixel coordinates, which are naturally present in high-resolution images of the human body. To address this issue, we propose a novel human pose estimation method that directly incorporates pixel coordinates. Our method adds a coordinate channel to the convolution process and embeds pixel coordinates into the feature map, while also using coordinate attention to capture position- and structure-sensitive features. We further reduce the network parameters and computational cost by using small-scale convolution kernels and a smooth activation function in residual blocks. We evaluate our model on the MPII Human Pose and COCO Keypoint Detection datasets and demonstrate improved accuracy, highlighting the importance of directly incorporating coordinate location information in position-sensitive tasks. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Genetic Algorithm Based One Class Support Vector Machine Model for Arabic Skilled Forgery Signature Verification
J. Imaging 2023, 9(4), 79; https://doi.org/10.3390/jimaging9040079 - 29 Mar 2023
Viewed by 961
Abstract
Recently, signature verification systems have been widely adopted for verifying individuals based on their handwritten signatures, especially in forensic and commercial transactions. Generally, feature extraction and classification tremendously impact the accuracy of system authentication. Feature extraction is challenging for signature verification systems due [...] Read more.
Recently, signature verification systems have been widely adopted for verifying individuals based on their handwritten signatures, especially in forensic and commercial transactions. Generally, feature extraction and classification tremendously impact the accuracy of system authentication. Feature extraction is challenging for signature verification systems due to the diverse forms of signatures and sample circumstances. Current signature verification techniques demonstrate promising results in identifying genuine and forged signatures. However, the overall performance of skilled forgery detection remains rigid to deliver high contentment. Furthermore, most of the current signature verification techniques demand a large number of learning samples to increase verification accuracy. This is the primary disadvantage of using deep learning, as the figure of signature samples is mainly restricted to the functional application of the signature verification system. In addition, the system inputs are scanned signatures that comprise noisy pixels, a complicated background, blurriness, and contrast decay. The main challenge has been attaining a balance between noise and data loss, since some essential information is lost during preprocessing, probably influencing the subsequent stages of the system. This paper tackles the aforementioned issues by presenting four main steps: preprocessing, multifeature fusion, discriminant feature selection using a genetic algorithm based on one class support vector machine (OCSVM-GA), and a one-class learning strategy to address imbalanced signature data in the practical application of a signature verification system. The suggested method employs three databases of signatures: SID-Arabic handwritten signatures, CEDAR, and UTSIG. Experimental results depict that the proposed approach outperforms current systems in terms of false acceptance rate (FAR), false rejection rate (FRR), and equal error rate (EER). Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
DFEN: Dual Feature Enhancement Network for Remote Sensing Image Caption
Electronics 2023, 12(7), 1547; https://doi.org/10.3390/electronics12071547 - 25 Mar 2023
Viewed by 528
Abstract
The remote sensing image caption can acquire ground objects and the semantic relationships between different ground objects. Existing remote sensing image caption algorithms do not acquire enough ground object information from remote-sensing images, resulting in inaccurate captions. As a result, this paper proposes [...] Read more.
The remote sensing image caption can acquire ground objects and the semantic relationships between different ground objects. Existing remote sensing image caption algorithms do not acquire enough ground object information from remote-sensing images, resulting in inaccurate captions. As a result, this paper proposes a codec-based Dual Feature Enhancement Network (“DFEN”) to enhance ground object information from both image and text levels. We build the Image-Enhancement module at the image level using the multiscale characteristics of remote sensing images. Furthermore, more discriminative image context features are obtained through the Image-Enhancement module. The hierarchical attention mechanism aggregates multi-level features and supplements the ground object information ignored due to large-scale differences. At the text level, we use the image’s potential visual features to guide the Text-Enhance module, resulting in text guidance features that correctly focus on the information of the ground objects. Experiment results show that the DFEN model can enhance ground object information from images and text. Specifically, the BLEU-1 index increased by 8.6% in UCM-caption, 2.3% in Sydney-caption, and 5.1% in RSICD. The DFEN model has promoted the exploration of advanced semantics of remote sensing images and facilitated the development of remote sensing image caption. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Enhancing Image Encryption with the Kronecker xor Product, the Hill Cipher, and the Sigmoid Logistic Map
Appl. Sci. 2023, 13(6), 4034; https://doi.org/10.3390/app13064034 - 22 Mar 2023
Cited by 1 | Viewed by 950
Abstract
In today’s digital age, it is crucial to secure the flow of information to protect data and information from being hacked during transmission or storage. To address this need, we present a new image encryption technique that combines the Kronecker xor product, Hill [...] Read more.
In today’s digital age, it is crucial to secure the flow of information to protect data and information from being hacked during transmission or storage. To address this need, we present a new image encryption technique that combines the Kronecker xor product, Hill cipher, and sigmoid logistic Map. Our proposed algorithm begins by shifting the values in each row of the state matrix to the left by a predetermined number of positions, then encrypting the resulting image using the Hill Cipher. The top value of each odd or even column is used to perform an xor operation with all values in the corresponding even or odd column, excluding the top value. The resulting image is then diffused using a sigmoid logistic map and subjected to the Kronecker xor product operation among the pixels to create a secure image. The image is then diffused again with other keys from the sigmoid logistic map for the final product. We compared our proposed method to recent work and found it to be safe and efficient in terms of performance after conducting statistical analysis, differential attack analysis, brute force attack analysis, and information entropy analysis. The results demonstrate that our proposed method is robust, lightweight, and fast in performance, meets the requirements for encryption and decryption, and is resistant to various attacks. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Fish Detection and Classification for Automatic Sorting System with an Optimized YOLO Algorithm
Appl. Sci. 2023, 13(6), 3812; https://doi.org/10.3390/app13063812 - 16 Mar 2023
Viewed by 1400
Abstract
Automatic fish recognition using deep learning and computer or machine vision is a key part of making the fish industry more productive through automation. An automatic sorting system will help to tackle the challenges of increasing food demand and the threat of food [...] Read more.
Automatic fish recognition using deep learning and computer or machine vision is a key part of making the fish industry more productive through automation. An automatic sorting system will help to tackle the challenges of increasing food demand and the threat of food scarcity in the future due to the continuing growth of the world population and the impact of global warming and climate change. As far as the authors know, there has been no published work so far to detect and classify moving fish for the fish culture industry, especially for automatic sorting purposes based on the fish species using deep learning and machine vision. This paper proposes an approach based on the recognition algorithm YOLOv4, optimized with a unique labeling technique. The proposed method was tested with videos of real fish running on a conveyor, which were put randomly in position and order at a speed of 505.08 m/h and could obtain an accuracy of 98.15%. This study with a simple but effective method is expected to be a guide for automatically detecting, classifying, and sorting fish. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Color Constancy Based on Local Reflectance Differences
Electronics 2023, 12(6), 1396; https://doi.org/10.3390/electronics12061396 - 15 Mar 2023
Viewed by 639
Abstract
Color constancy is used to determine the actual surface color of the scene affected by illumination so that the captured image is more in line with the characteristics of human perception. The well-known Gray-Edge hypothesis states that the average edge difference in a [...] Read more.
Color constancy is used to determine the actual surface color of the scene affected by illumination so that the captured image is more in line with the characteristics of human perception. The well-known Gray-Edge hypothesis states that the average edge difference in a scene is achromatic. Inspired by the Gray-Edge hypothesis, we propose a new illumination estimation method. Specifically, after analyzing three public datasets containing rich illumination conditions and scenes, we found that the ratio of the global sum of reflectance differences to the global sum of locally normalized reflectance differences is achromatic. Based on this hypothesis, we also propose an accurate color constancy method. The method was tested on four test datasets containing various illumination conditions (three datasets in a single-light environment and one dataset in a multi-light environment). The results show that the proposed method outperforms the state-of-the-art color constancy methods. Furthermore, we propose a new framework that can incorporate current mainstream statistics-based color constancy methods (Gray-World, Max-RGB, Gray-Edge, etc.) into the proposed framework. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Visual Attention Adversarial Networks for Chinese Font Translation
Electronics 2023, 12(6), 1388; https://doi.org/10.3390/electronics12061388 - 14 Mar 2023
Viewed by 649
Abstract
Currently, many Chinese font translation models adopt the method of dividing character components to improve the quality of generated font images. However, character components require a large amount of manual annotation to decompose characters and determine the composition of each character as input [...] Read more.
Currently, many Chinese font translation models adopt the method of dividing character components to improve the quality of generated font images. However, character components require a large amount of manual annotation to decompose characters and determine the composition of each character as input for training. In this paper, we establish a Chinese font translation model based on generative adversarial network without decomposition. First, we improve the method of image enhancement for Chinese character images. It helps the model learning structure information of Chinese character strokes to generate font images with complete and accurate strokes. Second, we propose a visual attention adversarial network. By using visual attention block, the network catches global and local features for constructing details of characters. Experiments demonstrate our method generates high-quality Chinese character images with great style diversity including calligraphy characters. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Real-Time Registration Algorithm of UAV Aerial Images Based on Feature Matching
J. Imaging 2023, 9(3), 67; https://doi.org/10.3390/jimaging9030067 - 11 Mar 2023
Viewed by 879
Abstract
This study aimed to achieve the accurate and real-time geographic positioning of UAV aerial image targets. We verified a method of registering UAV camera images on a map (with the geographic location) through feature matching. The UAV is usually in rapid motion and [...] Read more.
This study aimed to achieve the accurate and real-time geographic positioning of UAV aerial image targets. We verified a method of registering UAV camera images on a map (with the geographic location) through feature matching. The UAV is usually in rapid motion and involves changes in the camera head, and the map is high-resolution and has sparse features. These reasons make it difficult for the current feature-matching algorithm to accurately register the two (camera image and map) in real time, meaning that there will be a large number of mismatches. To solve this problem, we used the SuperGlue algorithm, which has a better performance, to match the features. The layer and block strategy, combined with the prior data of the UAV, was introduced to improve the accuracy and speed of feature matching, and the matching information obtained between frames was introduced to solve the problem of uneven registration. Here, we propose the concept of updating map features with UAV image features to enhance the robustness and applicability of UAV aerial image and map registration. After numerous experiments, it was proved that the proposed method is feasible and can adapt to the changes in the camera head, environment, etc. The UAV aerial image is stably and accurately registered on the map, and the frame rate reaches 12 frames per second, which provides a basis for the geo-positioning of UAV aerial image targets. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Night Vision Anti-Halation Algorithm Based on Different-Source Image Fusion Combining Visual Saliency with YUV-FNSCT
Electronics 2023, 12(6), 1303; https://doi.org/10.3390/electronics12061303 - 09 Mar 2023
Viewed by 773
Abstract
In order to address driver’s dazzle caused by the abuse of high beams when vehicles meet at night, a night vision anti-halation algorithm based on image fusion combining visual saliency with YUV-FNSCT is proposed. Improved Frequency-turned (FT) visual saliency detection is proposed to [...] Read more.
In order to address driver’s dazzle caused by the abuse of high beams when vehicles meet at night, a night vision anti-halation algorithm based on image fusion combining visual saliency with YUV-FNSCT is proposed. Improved Frequency-turned (FT) visual saliency detection is proposed to quickly lock on the objects of interest, such as vehicles and pedestrians, so as to improve the salient features of fusion images. The high- and low-frequency sub-bands of infrared saliency images and visible luminance components can quickly be obtained using fast non-subsampled contourlet transform (FNSCT), which has the characteristics of multi-direction, multi-scale, and shift-invariance. According to the halation degree in the visible image, the nonlinear adaptive fusion strategy of low-frequency weight reasonably eliminates halation while retaining useful information from the original image to the maximum extent. The statistical matching feature fusion strategy distinguishes the common and unique edge information from the high-frequency sub-bands by mutual matching so as to obtain more effective details of the original images such as the edges and contours. Only the luminance Y decomposed by YUV transform is involved in image fusion, which not only avoids color shift of the fusion image but also reduces the amount of computation. Considering the night driving environment and the degree of halation, the visible images and infrared images were collected for anti-halation fusion in six typical halation scenes on three types of roads covering most night driving conditions. The fused images obtained by the proposed algorithm demonstrate complete halation elimination, rich color details, and obvious salient features and have the best comprehensive index in each halation scene. The experimental results and analysis show that the proposed algorithm has advantages in halation elimination and visual saliency and has good universality for different night vision halation scenes, which help drivers to observe the road ahead and improve the safety of night driving. It also has certain applicability to rainy, foggy, smoggy, and other complex weather. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
DCTable: A Dilated CNN with Optimizing Anchors for Accurate Table Detection
J. Imaging 2023, 9(3), 62; https://doi.org/10.3390/jimaging9030062 - 07 Mar 2023
Viewed by 763
Abstract
With the widespread use of deep learning in leading systems, it has become the mainstream in the table detection field. Some tables are difficult to detect because of the likely figure layout or the small size. As a solution to the underlined problem, [...] Read more.
With the widespread use of deep learning in leading systems, it has become the mainstream in the table detection field. Some tables are difficult to detect because of the likely figure layout or the small size. As a solution to the underlined problem, we propose a novel method, called DCTable, to improve Faster R-CNN for table detection. DCTable came up to extract more discriminative features using a backbone with dilated convolutions in order to improve the quality of region proposals. Another main contribution of this paper is the anchors optimization using the Intersection over Union (IoU)-balanced loss to train the RPN and reduce the false positive rate. This is followed by a RoI Align layer, instead of the ROI pooling, to improve the accuracy during mapping table proposal candidates by eliminating the coarse misalignment and introducing the bilinear interpolation in mapping region proposal candidates. Training and testing on a public dataset showed the effectiveness of the algorithm and a considerable improvement of the F1-score on ICDAR 2017-Pod, ICDAR-2019, Marmot and RVL CDIP datasets. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Hybrid Classifiers for Spatio-Temporal Abnormal Behavior Detection, Tracking, and Recognition in Massive Hajj Crowds
Electronics 2023, 12(5), 1165; https://doi.org/10.3390/electronics12051165 - 28 Feb 2023
Cited by 2 | Viewed by 1281
Abstract
Individual abnormal behaviors vary depending on crowd sizes, contexts, and scenes. Challenges such as partial occlusions, blurring, a large number of abnormal behaviors, and camera viewing occur in large-scale crowds when detecting, tracking, and recognizing individuals with abnormalities. In this paper, our contribution [...] Read more.
Individual abnormal behaviors vary depending on crowd sizes, contexts, and scenes. Challenges such as partial occlusions, blurring, a large number of abnormal behaviors, and camera viewing occur in large-scale crowds when detecting, tracking, and recognizing individuals with abnormalities. In this paper, our contribution is two-fold. First, we introduce an annotated and labeled large-scale crowd abnormal behavior Hajj dataset, HAJJv2. Second, we propose two methods of hybrid convolutional neural networks (CNNs) and random forests (RFs) to detect and recognize spatio-temporal abnormal behaviors in small and large-scale crowd videos. In small-scale crowd videos, a ResNet-50 pre-trained CNN model is fine-tuned to verify whether every frame is normal or abnormal in the spatial domain. If anomalous behaviors are observed, a motion-based individual detection method based on the magnitudes and orientations of Horn–Schunck optical flow is proposed to locate and track individuals with abnormal behaviors. A Kalman filter is employed in large-scale crowd videos to predict and track the detected individuals in the subsequent frames. Then, means and variances as statistical features are computed and fed to the RF classifier to classify individuals with abnormal behaviors in the temporal domain. In large-scale crowds, we fine-tune the ResNet-50 model using a YOLOv2 object detection technique to detect individuals with abnormal behaviors in the spatial domain. The proposed method achieves 99.76% and 93.71% of average area under the curves (AUCs) on two public benchmark small-scale crowd datasets, UMN and UCSD, respectively, while the large-scale crowd method achieves 76.08% average AUC using the HAJJv2 dataset. Our method outperforms state-of-the-art methods using the small-scale crowd datasets with a margin of 1.66%, 6.06%, and 2.85% on UMN, UCSD Ped1, and UCSD Ped2, respectively. It also produces an acceptable result in large-scale crowds. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
KD-PatchMatch: A Self-Supervised Training Learning-Based PatchMatch
Appl. Sci. 2023, 13(4), 2224; https://doi.org/10.3390/app13042224 - 09 Feb 2023
Viewed by 570
Abstract
Traditional learning-based multi-view stereo (MVS) methods usually need to find the correct depth value from a large number of depth candidates, which leads to huge memory consumption and slow inference. To address these problems, we propose a probabilistic depth sampling in the learning-based [...] Read more.
Traditional learning-based multi-view stereo (MVS) methods usually need to find the correct depth value from a large number of depth candidates, which leads to huge memory consumption and slow inference. To address these problems, we propose a probabilistic depth sampling in the learning-based PatchMatch framework, i.e., sampling a small number of depth candidates from a single-view probability distribution, which achieves the purpose of saving computational resources. Furthermore, to overcome the difficulty of obtaining ground-truth depth for outdoor large-scale scenes, we also propose a self-supervised training pipeline based on knowledge distillation, which involves self-supervised teacher training and student training based on knowledge distillation. Extensive experiments show that our approach outperforms other recent learning-based MVS methods on DTU, Tanks and Temples, and ETH3D datasets. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Human Pose Estimation via Dynamic Information Transfer
Electronics 2023, 12(3), 695; https://doi.org/10.3390/electronics12030695 - 30 Jan 2023
Viewed by 1351
Abstract
This paper presents a multi-task learning framework, called the dynamic information transfer network (DITN). We mainly focused on improving the pose estimation with the spatial relationship of the adjacent joints. To benefit from the explicit structural knowledge, we constructed two branches with a [...] Read more.
This paper presents a multi-task learning framework, called the dynamic information transfer network (DITN). We mainly focused on improving the pose estimation with the spatial relationship of the adjacent joints. To benefit from the explicit structural knowledge, we constructed two branches with a shared backbone to localize the human joints and bones, respectively. Since related tasks share a high-level representation, we leveraged the bone information to refine the joint localization via dynamic information transfer. In detail, we extracted the dynamic parameters from the bone branch and used them to make the network learn constraint relationships via dynamic convolution. Moreover, attention blocks were added after the information transfer to balance the information across different granularity levels and induce the network to focus on the informative regions. The experimental results demonstrated the effectiveness of the DITN, which achieved 90.8% [email protected] on MPII and 75.0% AP on COCO. The qualitative results on the MPII and COCO datasets showed that the DITN achieved better performance, especially on heavily occluded or easily confusable joint localization. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Dynamic Multi-Attention Dehazing Network with Adaptive Feature Fusion
Electronics 2023, 12(3), 529; https://doi.org/10.3390/electronics12030529 - 19 Jan 2023
Viewed by 709
Abstract
This paper proposes a Dynamic Multi-Attention Dehazing Network (DMADN) for single image dehazing. The proposed network consists of two key components, the Dynamic Feature Attention (DFA) module, and the Adaptive Feature Fusion (AFF) module. The DFA module provides pixel-wise weights and channel-wise weights [...] Read more.
This paper proposes a Dynamic Multi-Attention Dehazing Network (DMADN) for single image dehazing. The proposed network consists of two key components, the Dynamic Feature Attention (DFA) module, and the Adaptive Feature Fusion (AFF) module. The DFA module provides pixel-wise weights and channel-wise weights for input features, considering that the haze distribution is always uneven in a degenerated image and the value in each channel is different. We propose an AFF module based on the adaptive mixup operation to restore the missing spatial information from high-resolution layers. Most previous works have concentrated on increasing the scale of the model to improve dehazing performance, which makes it difficult to apply in edge devices. We introduce contrastive learning in our training processing, which leverages both positive and negative samples to optimize our network. The contrastive learning strategy could effectively improve the quality of output while not increasing the model’s complexity and inference time in the testing phase. Extensive experimental results on the synthetic and real-world hazy images demonstrate that DMADN achieves state-of-the-art dehazing performance with a competitive number of parameters. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Wildlife Object Detection Method Applying Segmentation Gradient Flow and Feature Dimensionality Reduction
Electronics 2023, 12(2), 377; https://doi.org/10.3390/electronics12020377 - 11 Jan 2023
Cited by 3 | Viewed by 1140 | Correction
Abstract
This work suggests an enhanced natural environment animal detection algorithm based on YOLOv5s to address the issues of low detection accuracy and sluggish detection speed when automatically detecting and classifying large animals in natural environments. To increase the detection speed of the model, [...] Read more.
This work suggests an enhanced natural environment animal detection algorithm based on YOLOv5s to address the issues of low detection accuracy and sluggish detection speed when automatically detecting and classifying large animals in natural environments. To increase the detection speed of the model, the algorithm first enhances the SPP by switching the parallel connection of the original maximum pooling layer for a series connection. It then expands the model’s receptive field using the dataset from this paper to enhance the feature fusion network by stacking the feature pyramid network structure as a whole; secondly, it introduces the GSConv module, which combines standard convolution, depth-separable convolution, and hybrid channels to reduce network parameters and computation, making the model lightweight and easier to deploy to endpoints. At the same time, GS bottleneck is used to replace the Bottleneck module in C3, which divides the input feature map into two channels and assigns different weights to them. The two channels are combined and connected in accordance with the number of channels, which enhances the model’s ability to express non-linear functions and resolves the gradient disappearance issue. Wildlife images are obtained from the OpenImages public dataset and real-life shots. The experimental results show that the improved YOLOv5s algorithm proposed in this paper reduces the computational effort of the model compared to the original algorithm, while also providing an improvement in both detection accuracy and speed, and it can be well applied to the real-time detection of animals in natural environments. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Automatic Method for Vickers Hardness Estimation by Image Processing
J. Imaging 2023, 9(1), 8; https://doi.org/10.3390/jimaging9010008 - 30 Dec 2022
Viewed by 1033
Abstract
Hardness is one of the most important mechanical properties of materials, since it is used to estimate their quality and to determine their suitability for a particular application. One method of determining quality is the Vickers hardness test, in which the resistance to [...] Read more.
Hardness is one of the most important mechanical properties of materials, since it is used to estimate their quality and to determine their suitability for a particular application. One method of determining quality is the Vickers hardness test, in which the resistance to plastic deformation at the surface of the material is measured after applying force with an indenter. The hardness is measured from the sample image, which is a tedious, time-consuming, and prone to human error procedure. Therefore, in this work, a new automatic method based on image processing techniques is proposed, allowing for obtaining results quickly and more accurately even with high irregularities in the indentation mark. For the development and validation of the method, a set of microscopy images of samples indented with applied forces of 5N and 10N on AISI D2 steel with and without quenching, tempering heat treatment and samples coated with titanium niobium nitride (TiNbN) was used. The proposed method was implemented as a plugin of the ImageJ program, allowing for obtaining reproducible Vickers hardness results in an average time of 2.05 seconds with an accuracy of 98.3% and a maximum error of 4.5% with respect to the values obtained manually, used as a golden standard. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Prototype-Based Self-Adaptive Distribution Calibration for Few-Shot Image Classification
Electronics 2023, 12(1), 134; https://doi.org/10.3390/electronics12010134 - 28 Dec 2022
Viewed by 1024
Abstract
Deep learning has flourished in large-scale supervised tasks. However, in many practical conditions, rich and available labeled data are a luxury. Thus, few-shot learning (FSL) has recently received boosting interest and achieved significant progress, which can learn new classes from several labeled samples. [...] Read more.
Deep learning has flourished in large-scale supervised tasks. However, in many practical conditions, rich and available labeled data are a luxury. Thus, few-shot learning (FSL) has recently received boosting interest and achieved significant progress, which can learn new classes from several labeled samples. The advanced distribution calibration approach estimates the ground-truth distribution of few-shot classes by reusing the statistics of auxiliary data. However, there is still a significant discrepancy between the estimated distributions and ground-truth distributions, and artificially set hyperparameters cannot be adapted to different application scenarios (i.e., datasets). This paper proposes a prototype-based self-adaptive distribution calibration framework for estimating ground-truth distribution accurately and self-adaptive hyperparameter optimization for different application scenarios. Specifically, the proposed method is divided into two components. The prototype-based representative mechanism is for obtaining and utilizing more global information about few-shot classes and improving classification performance. The self-adaptive hyperparameter optimization algorithm searches robust hyperparameters for the distribution calibration of different application scenarios. The ablation studies verify the effectiveness of the various components of the proposed framework. Enormous experiments are conducted on three standard benchmarks such as miniImageNet, CUB-200-2011, and CIFAR-FS. The competitive results and compelling visualizations indicate that the proposed framework achieves state-of-the-art performance. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Nonlinear Diffusion Model with Smoothed Background Estimation to Enhance Degraded Images for Defect Detection
Appl. Sci. 2023, 13(1), 211; https://doi.org/10.3390/app13010211 - 24 Dec 2022
Viewed by 655
Abstract
It is important to detect the defect of products efficiently in modern industrial manufacturing. Image processing is one of common techniques to achieve defect detection successfully. To process images degraded by noise and lower contrast effects in some scenes, this paper presents a [...] Read more.
It is important to detect the defect of products efficiently in modern industrial manufacturing. Image processing is one of common techniques to achieve defect detection successfully. To process images degraded by noise and lower contrast effects in some scenes, this paper presents a new energy functional with background fitting, then deduces a novel model which approximates to estimate the smoothed background and performs the nonlinear diffusion on the residual image. Noise removal and background correction can be both successfully achieved while the defect feature is preserved. Finally, the proposed method and some other comparative methods are performed on several experiments with some classical degraded images. The numerical results and quantitative evaluation show the efficiency and advantages of the proposed method. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Driver Emotion and Fatigue State Detection Based on Time Series Fusion
Electronics 2023, 12(1), 26; https://doi.org/10.3390/electronics12010026 - 21 Dec 2022
Cited by 1 | Viewed by 1362
Abstract
Studies have shown that driver fatigue or unpleasant emotions significantly increase driving risks. Detecting driver emotions and fatigue states and providing timely warnings can effectively minimize the incidence of traffic accidents. However, existing models rarely combine driver emotion and fatigue detection, and there [...] Read more.
Studies have shown that driver fatigue or unpleasant emotions significantly increase driving risks. Detecting driver emotions and fatigue states and providing timely warnings can effectively minimize the incidence of traffic accidents. However, existing models rarely combine driver emotion and fatigue detection, and there is space to improve the accuracy of recognition. In this paper, we propose a non-invasive and efficient detection method for driver fatigue and emotional state, which is the first time to combine them in the detection of driver state. Firstly, the captured video image sequences are preprocessed, and Dlib (image open source processing library) is used to locate face regions and mark key points; secondly, facial features are extracted, and fatigue indicators, such as driver eye closure time (PERCLOS) and yawn frequency are calculated using the dual-threshold method and fused by mathematical methods; thirdly, an improved lightweight RM-Xception convolutional neural network is introduced to identify the driver’s emotional state; finally, the two indicators are fused based on time series to obtain a comprehensive score for evaluating the driver’s state. The results show that the fatigue detection algorithm proposed in this paper has high accuracy, and the accuracy of the emotion recognition network reaches an accuracy rate of 73.32% on the Fer2013 dataset. The composite score calculated based on time series fusion can comprehensively and accurately reflect the driver state in different environments and make a contribution to future research in the field of assisted safe driving. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Construction of a Character Dataset for Historical Uchen Tibetan Documents under Low-Resource Conditions
Electronics 2022, 11(23), 3919; https://doi.org/10.3390/electronics11233919 - 27 Nov 2022
Viewed by 617
Abstract
The construction of a character dataset is an important part of the research on document analysis and recognition of historical Tibetan documents. The results of character segmentation research in the previous stage are presented by coloring the characters with different color values. On [...] Read more.
The construction of a character dataset is an important part of the research on document analysis and recognition of historical Tibetan documents. The results of character segmentation research in the previous stage are presented by coloring the characters with different color values. On this basis, the characters are annotated, and the character images corresponding to the annotation are extracted to construct a character dataset. The construction of a character dataset is carried out as follows: (1) text annotation of segmented characters is performed; (2) the character image is extracted from the character block based on the real position information; (3) according to the class of annotated text, the extracted character images are classified to construct a preliminary character dataset; (4) data augmentation is used to solve the imbalance of classes and samples in the preliminary dataset; (5) research on character recognition based on the constructed dataset is performed. The experimental results show that under low-resource conditions, this paper solves the challenges in the construction of a historical Uchen Tibetan document character dataset and constructs a 610-class character dataset. This dataset lays the foundation for the character recognition of historical Tibetan documents and provides a reference for the construction of relevant document datasets. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Real-Time Detection of Mango Based on Improved YOLOv4
Electronics 2022, 11(23), 3853; https://doi.org/10.3390/electronics11233853 - 23 Nov 2022
Cited by 3 | Viewed by 931
Abstract
Agricultural mechanization occupies a key position in modern agriculture. Aiming at the fruit recognition target detection part of the picking robot, a mango recognition method based on an improved YOLOv4 network structure is proposed, which can quickly and accurately identify and locate mangoes. [...] Read more.
Agricultural mechanization occupies a key position in modern agriculture. Aiming at the fruit recognition target detection part of the picking robot, a mango recognition method based on an improved YOLOv4 network structure is proposed, which can quickly and accurately identify and locate mangoes. The method improves the recognition accuracy of the width adjustment network, then reduces the ResNet (Residual Networks) module to adjust the neck network to improve the prediction speed, and finally adds CBAM (Convolutional Block Attention Module) to improve the prediction accuracy of the network. The newly improved network model is YOLOv4-LightC-CBAM. The training results show that the mAP (mean Average Precision) obtained by YOLOV4-LightC-CBAM is 95.12%, which is 3.93% higher than YOLOv4. Regarding detection speed, YOLOV4-LightC-CBAM is up to 45.4 frames, which is 85.3% higher than YOLOv4. The results show that the modified network can recognize mangoes better, faster, and more accurately. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Separate Syntax and Semantics: Part-of-Speech-Guided Transformer for Image Captioning
Appl. Sci. 2022, 12(23), 11875; https://doi.org/10.3390/app122311875 - 22 Nov 2022
Viewed by 630
Abstract
Transformer-based image captioning models have recently achieved remarkable performance by using new fully attentive paradigms. However, existing models generally follow the conventional language model of predicting the next word conditioned on the visual features and partially generated words. They treat the predictions of [...] Read more.
Transformer-based image captioning models have recently achieved remarkable performance by using new fully attentive paradigms. However, existing models generally follow the conventional language model of predicting the next word conditioned on the visual features and partially generated words. They treat the predictions of visual and nonvisual words equally and usually tend to produce generic captions. To address these issues, we propose a novel part-of-speech-guided transformer (PoS-Transformer) framework for image captioning. Specifically, a self-attention part-of-speech prediction network is first presented to model the part-of-speech tag sequences for the corresponding image captions. Then, different attention mechanisms are constructed for the decoder to guide the caption generation by using the part-of-speech information. Benefiting from the part-of-speech guiding mechanisms, the proposed framework not only adaptively adjusts the weights between visual features and language signals for the word prediction, but also facilitates the generation of more fine-grained and grounded captions. Finally, a multitask learning is introduced to train the whole PoS-Transformer network in an end-to-end manner. Our model was trained and tested on the MSCOCO and Flickr30k datasets with the experimental evaluation standard CIDEr scores of 1.299 and 0.612, respectively. The qualitative experimental results indicated that the captions generated by our method conformed to the grammatical rules better. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Review
A Review of Synthetic Image Data and Its Use in Computer Vision
J. Imaging 2022, 8(11), 310; https://doi.org/10.3390/jimaging8110310 - 21 Nov 2022
Cited by 1 | Viewed by 2226
Abstract
Development of computer vision algorithms using convolutional neural networks and deep learning has necessitated ever greater amounts of annotated and labelled data to produce high performance models. Large, public data sets have been instrumental in pushing forward computer vision by providing the data [...] Read more.
Development of computer vision algorithms using convolutional neural networks and deep learning has necessitated ever greater amounts of annotated and labelled data to produce high performance models. Large, public data sets have been instrumental in pushing forward computer vision by providing the data necessary for training. However, many computer vision applications cannot rely on general image data provided in the available public datasets to train models, instead requiring labelled image data that is not readily available in the public domain on a large scale. At the same time, acquiring such data from the real world can be difficult, costly to obtain, and manual labour intensive to label in large quantities. Because of this, synthetic image data has been pushed to the forefront as a potentially faster and cheaper alternative to collecting and annotating real data. This review provides general overview of types of synthetic image data, as categorised by synthesised output, common methods of synthesising different types of image data, existing applications and logical extensions, performance of synthetic image data in different applications and the associated difficulties in assessing data performance, and areas for further research. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Computer Vision-Based Approach for Automatic Detection of Dairy Cow Breed
Electronics 2022, 11(22), 3791; https://doi.org/10.3390/electronics11223791 - 18 Nov 2022
Cited by 1 | Viewed by 1213
Abstract
Purpose: Identification of individual cow breeds may offer various farming opportunities for disease detection, disease prevention and treatment, fertility and feeding, and welfare monitoring. However, due to the large population of cows with hundreds of breeds and almost identical visible appearance, their [...] Read more.
Purpose: Identification of individual cow breeds may offer various farming opportunities for disease detection, disease prevention and treatment, fertility and feeding, and welfare monitoring. However, due to the large population of cows with hundreds of breeds and almost identical visible appearance, their exact identification and detection become a tedious task. Therefore, the automatic detection of cow breeds would benefit the dairy industry. This study presents a computer-vision-based approach for identifying the breed of individual cattle. Methods: In this study, eight breeds of cows are considered to verify the classification process: Afrikaner, Brown Swiss, Gyr, Holstein Friesian, Limousin, Marchigiana, White Park, and Simmental cattle. A custom dataset is developed using web-mining techniques, comprising 1835 images grouped into 238, 223, 220, 212, 253, 185, 257, and 247 images for individual breeds. YOLOv4, a deep learning approach, is employed for breed classification and localization. The performance of the YOLOv4 algorithm is evaluated by training the model on different sets of training parameters. Results: Comprehensive analysis of the experimental results reveal that the proposed approach achieves an accuracy of 81.07%, with maximum kappa of 0.78 obtained at an image size of 608 × 608 and an intersection over union (IoU) threshold of 0.75 on the test dataset. Conclusions: The model performed better with YOLOv4 relative to other compared models. This places the proposed model among the top-ranked cow breed detection models. For future recommendations, it would be beneficial to incorporate simple tracking techniques between video frames to check the efficiency of this work. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Research on the Correlation Filter Tracking Model Based on the Deep-Pruned Feature Network
Appl. Sci. 2022, 12(22), 11490; https://doi.org/10.3390/app122211490 - 12 Nov 2022
Viewed by 664
Abstract
Visual tracking is one of the key research fields in computer vision. Based on the combination of correlation filter tracking (CFT) model and deep convolutional neural networks (DCNNs), deep correlation filter tracking (DCFT) has recently become a critical issue in visual tracking because [...] Read more.
Visual tracking is one of the key research fields in computer vision. Based on the combination of correlation filter tracking (CFT) model and deep convolutional neural networks (DCNNs), deep correlation filter tracking (DCFT) has recently become a critical issue in visual tracking because of CFT’s rapidity and DCNN’s better feature representation. However, DCNNs are often complex in structure, which most possibly results in the conflict between the rapidity and accuracy of DCFT. To reduce such conflict, this paper proposes a model mainly including: (1) Based on the pre-pruning network obtained by feature channel importance, an optimal global tracking pruning rate (GTPR) is determined in terms of the contribution of filter channels to tracking response. (2) Based on (GTPR), an alternative convolutional kernel is defined to replace non-important channel kernels, which leads to the further pruning of the feature network. (3) An online updating pruned feature network with a structural similarity index is employed to adapt the model to tracking scene changes. (4) The proposed model was performed on OTB2013; experimental results demonstrate the model can effectively enhance speed with a 45% increment while guaranteeing tracking accuracy, and improve tracking accuracy with a 4% increment when tracking scene changes take place. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Novel Separable Scheme for Encryption and Reversible Data Hiding
Electronics 2022, 11(21), 3505; https://doi.org/10.3390/electronics11213505 - 28 Oct 2022
Cited by 1 | Viewed by 696
Abstract
With the increasing emphasis on security and privacy, video in the cloud sometimes needs to be stored and processed in an encrypted format. To facilitate the indexing and tampering detection of encrypted videos, data hiding is performed in encrypted videos. This paper proposes [...] Read more.
With the increasing emphasis on security and privacy, video in the cloud sometimes needs to be stored and processed in an encrypted format. To facilitate the indexing and tampering detection of encrypted videos, data hiding is performed in encrypted videos. This paper proposes a novel separable scheme for encryption and reversible data hiding. In terms of encryption method, intra-prediction mode and motion vector difference are encrypted by XOR encryption, and quantized discrete cosine transform block is permutated based on logistic chaotic mapping. In terms of the reversible data hiding algorithm, difference expansion is applied in encrypted video for the first time in this paper. The encryption method and the data hiding algorithm are separable, and the embedded information can be accurately extracted in both encrypted video bitstream and decrypted video bitstream. The experimental results show that the proposed encryption method can resist sketch attack and has higher security than other schemes, keeping the bit rate unchanged. The embedding algorithm used in the proposed scheme can provide higher capacity in the video with lower quantization parameter and good visual quality of the labeled decrypted video, maintaining low bit rate variation. The video encryption and the reversible data hiding are separable and the scheme can be applied in more scenarios. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Intracranial Hemorrhages Segmentation and Features Selection Applying Cuckoo Search Algorithm with Gated Recurrent Unit
Appl. Sci. 2022, 12(21), 10851; https://doi.org/10.3390/app122110851 - 26 Oct 2022
Viewed by 820
Abstract
Generally, traumatic and aneurysmal brain injuries cause intracranial hemorrhages, which is a severe disease that results in death, if it is not treated and diagnosed properly at the early stage. Compared to other imaging techniques, Computed Tomography (CT) images are extensively utilized by [...] Read more.
Generally, traumatic and aneurysmal brain injuries cause intracranial hemorrhages, which is a severe disease that results in death, if it is not treated and diagnosed properly at the early stage. Compared to other imaging techniques, Computed Tomography (CT) images are extensively utilized by clinicians for locating and identifying intracranial hemorrhage regions. However, it is a time-consuming and complex task, which majorly depends on professional clinicians. To highlight this problem, a novel model is developed for the automatic detection of intracranial hemorrhages. After collecting the 3D CT scans from the Radiological Society of North America (RSNA) 2019 brain CT hemorrhage database, the image segmentation is carried out using Fuzzy C Means (FCM) clustering algorithm. Then, the hybrid feature extraction is accomplished on the segmented regions utilizing the Histogram of Oriented Gradients (HoG), Local Ternary Pattern (LTP), and Local Binary Pattern (LBP) to extract discriminative features. Furthermore, the Cuckoo Search Optimization (CSO) algorithm and the Optimized Gated Recurrent Unit (OGRU) classifier are integrated for feature selection and sub-type classification of intracranial hemorrhages. In the resulting segment, the proposed ORGU-CSO model obtained 99.36% of classification accuracy, which is higher related to other considered classifiers. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
4-Band Multispectral Images Demosaicking Combining LMMSE and Adaptive Kernel Regression Methods
J. Imaging 2022, 8(11), 295; https://doi.org/10.3390/jimaging8110295 - 25 Oct 2022
Viewed by 1000
Abstract
In recent years, multispectral imaging systems are considerably expanding with a variety of multispectral demosaicking algorithms. The most crucial task is setting up an optimal multispectral demosaicking algorithm in order to reconstruct the image with less error from the raw image of a [...] Read more.
In recent years, multispectral imaging systems are considerably expanding with a variety of multispectral demosaicking algorithms. The most crucial task is setting up an optimal multispectral demosaicking algorithm in order to reconstruct the image with less error from the raw image of a single sensor. In this paper, we presented a four-band multispectral filter array (MSFA) with the dominant blue band and a multispectral demosaicking algorithm that combines the linear minimum mean square error (LMMSE) and the adaptive kernel regression methods. To estimate the missing blue bands, we used the LMMSE algorithm and for the other spectral bands, the directional gradient method, which relies on the estimated blue bands. The adaptive kernel regression is then applied to each spectral band for their update without persistent artifacts. The experiment results demonstrate that our proposed method outperforms other existing approaches both visually and quantitatively in terms of peak signal-to-noise-ratio (PSNR), structural similarity index (SSIM) and root mean square error (RMSE). Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
The Effect of Data Augmentation Methods on Pedestrian Object Detection
Electronics 2022, 11(19), 3185; https://doi.org/10.3390/electronics11193185 - 04 Oct 2022
Cited by 2 | Viewed by 1074
Abstract
Night landscapes are a key area of monitoring and security as information in pictures caught on camera is not comprehensive. Data augmentation gives these limited datasets the most value. Considering night driving and dangerous events, it is important to achieve the better detection [...] Read more.
Night landscapes are a key area of monitoring and security as information in pictures caught on camera is not comprehensive. Data augmentation gives these limited datasets the most value. Considering night driving and dangerous events, it is important to achieve the better detection of people at night. This paper studies the impact of different data augmentation methods on target detection. For the image data collected at night under limited conditions, three different types of enhancement methods are used to verify whether they can promote pedestrian detection. This paper mainly explores supervised and unsupervised data augmentation methods with certain improvements, including multi-sample augmentation, unsupervised Generative Adversarial Network (GAN) augmentation and single-sample augmentation. It is concluded that the dataset obtained by the heterogeneous multi-sample augmentation method can optimize the target detection model, which can allow the mean average precision (mAP) of a night image to reach 0.76, and the improved Residual Convolutional GAN network, the unsupervised training model, can generate new samples with the same style, thus greatly expanding the dataset, so that the mean average precision reaches 0.854, and the single-sample enhancement of the deillumination can greatly improve the image clarity, helping improve the precision value by 0.116. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
An Interference-Resistant and Low-Consumption Lip Recognition Method
Electronics 2022, 11(19), 3066; https://doi.org/10.3390/electronics11193066 - 26 Sep 2022
Cited by 1 | Viewed by 740
Abstract
Lip movements contain essential linguistic information. It is an important medium for studying the content of the dialogue. At present, there are many studies on how to improve the accuracy of lip language recognition models. However, there are few studies on the robustness [...] Read more.
Lip movements contain essential linguistic information. It is an important medium for studying the content of the dialogue. At present, there are many studies on how to improve the accuracy of lip language recognition models. However, there are few studies on the robustness and generalization performance of the model under various disturbances. Specific experiments show that the current state-of-the-art lip recognition model significantly drops in accuracy when disturbed and is particularly sensitive to adversarial examples. This paper substantially alleviates this problem by using Mixup training. Taking the model subjected to negative attacks generated by FGSM as an example, the model in this paper achieves 85.0% and 40.2% accuracy on the English dataset LRW and the Mandarin dataset LRW-1000, respectively. The correct recognition rates are improved by 9.8% and 8.3%, compared with the current advanced lip recognition models. The positive impact of Mixup training on the robustness and generalization of lip recognition models is demonstrated. In addition, the performance of the lip recognition classification model depends more on the training parameters, which increase the computational cost. The InvNet-18 network in this paper reduces the consumption of GPU resources and the training time while improving the model accuracy. Compared with the standard ResNet-18 network used in mainstream lip recognition models, the InvNet-18 network in this paper has more than three times lower GPU consumption and 32% fewer parameters. After detailed analysis and comparison in various aspects, it is demonstrated that the model in this paper can effectively improve the model’s anti-interference ability and reduce training resource consumption. At the same time, the accuracy is comparable with the current state-of-the-art results. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Facial Action Unit Recognition by Prior and Adaptive Attention
Electronics 2022, 11(19), 3047; https://doi.org/10.3390/electronics11193047 - 24 Sep 2022
Cited by 1 | Viewed by 852
Abstract
Facial action unit (AU) recognition remains a challenging task, due to the subtlety and non-rigidity of AUs. A typical solution is to localize the correlated regions of each AU. Current works often predefine the region of interest (ROI) of each AU via prior [...] Read more.
Facial action unit (AU) recognition remains a challenging task, due to the subtlety and non-rigidity of AUs. A typical solution is to localize the correlated regions of each AU. Current works often predefine the region of interest (ROI) of each AU via prior knowledge, or try to capture the ROI only by the supervision of AU recognition during training. However, the predefinition often neglects important regions, while the supervision is insufficient to precisely localize ROIs. In this paper, we propose a novel AU recognition method by prior and adaptive attention. Specifically, we predefine a mask for each AU, in which the locations farther away from the AU centers specified by prior knowledge have lower weights. A learnable parameter is adopted to control the importance of different locations. Then, we element-wise multiply the mask by a learnable attention map, and use the new attention map to extract the AU-related feature, in which AU recognition can supervise the adaptive learning of a new attention map. Experimental results show that our method (i) outperforms the state-of-the-art AU recognition approaches on challenging benchmark datasets, and (ii) can accurately reason the regional attention distribution of each AU by combining the advantages of both the predefinition and the supervision. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
DS6, Deformation-Aware Semi-Supervised Learning: Application to Small Vessel Segmentation with Noisy Training Data
J. Imaging 2022, 8(10), 259; https://doi.org/10.3390/jimaging8100259 - 22 Sep 2022
Cited by 1 | Viewed by 2259
Abstract
Blood vessels of the brain provide the human brain with the required nutrients and oxygen. As a vulnerable part of the cerebral blood supply, pathology of small vessels can cause serious problems such as Cerebral Small Vessel Diseases (CSVD). It has also been [...] Read more.
Blood vessels of the brain provide the human brain with the required nutrients and oxygen. As a vulnerable part of the cerebral blood supply, pathology of small vessels can cause serious problems such as Cerebral Small Vessel Diseases (CSVD). It has also been shown that CSVD is related to neurodegeneration, such as Alzheimer’s disease. With the advancement of 7 Tesla MRI systems, higher spatial image resolution can be achieved, enabling the depiction of very small vessels in the brain. Non-Deep Learning-based approaches for vessel segmentation, e.g., Frangi’s vessel enhancement with subsequent thresholding, are capable of segmenting medium to large vessels but often fail to segment small vessels. The sensitivity of these methods to small vessels can be increased by extensive parameter tuning or by manual corrections, albeit making them time-consuming, laborious, and not feasible for larger datasets. This paper proposes a deep learning architecture to automatically segment small vessels in 7 Tesla 3D Time-of-Flight (ToF) Magnetic Resonance Angiography (MRA) data. The algorithm was trained and evaluated on a small imperfect semi-automatically segmented dataset of only 11 subjects; using six for training, two for validation, and three for testing. The deep learning model based on U-Net Multi-Scale Supervision was trained using the training subset and was made equivariant to elastic deformations in a self-supervised manner using deformation-aware learning to improve the generalisation performance. The proposed technique was evaluated quantitatively and qualitatively against the test set and achieved a Dice score of 80.44 ± 0.83. Furthermore, the result of the proposed method was compared against a selected manually segmented region (62.07 resultant Dice) and has shown a considerable improvement (18.98%) with deformation-aware learning. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Attentive SOLO for Sonar Target Segmentation
Electronics 2022, 11(18), 2904; https://doi.org/10.3390/electronics11182904 - 13 Sep 2022
Viewed by 902
Abstract
Imaging sonar systems play an important role in underwater target detection and location. Due to the influence of reverberation noise on imaging sonar systems, the task of sonar target segmentation is a challenging problem. In order to segment different types of targets in [...] Read more.
Imaging sonar systems play an important role in underwater target detection and location. Due to the influence of reverberation noise on imaging sonar systems, the task of sonar target segmentation is a challenging problem. In order to segment different types of targets in sonar images accurately, we proposed the gated fusion-pyramid segmentation attention (GF-PSA) module. Specifically, inspired by gated full fusion, we improved the pyramid segmentation attention (PSA) module by using gated fusion to reduce the noise interference during feature fusion and improve segmentation accuracy. Then, we improved the SOLOv2 (Segmenting Objects by Locations v2) algorithm with the proposed GF-PSA and named the improved algorithm Attentive SOLO. In addition, we constructed a sonar target segmentation dataset, named STSD, which contains 4000 real sonar images, covering eight object categories with a total of 7077 target annotations. The experimental results show that the segmentation accuracy of Attentive SOLO on STSD is as high as 74.1%, which is 3.7% higher than that of SOLOv2. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Privacy-Preserving Semantic Segmentation Using Vision Transformer
J. Imaging 2022, 8(9), 233; https://doi.org/10.3390/jimaging8090233 - 30 Aug 2022
Cited by 2 | Viewed by 1395
Abstract
In this paper, we propose a privacy-preserving semantic segmentation method that uses encrypted images and models with the vision transformer (ViT), called the segmentation transformer (SETR). The combined use of encrypted images and SETR allows us not only to apply images without sensitive [...] Read more.
In this paper, we propose a privacy-preserving semantic segmentation method that uses encrypted images and models with the vision transformer (ViT), called the segmentation transformer (SETR). The combined use of encrypted images and SETR allows us not only to apply images without sensitive visual information to SETR as query images but to also maintain the same accuracy as that of using plain images. Previously, privacy-preserving methods with encrypted images for deep neural networks have focused on image classification tasks. In addition, the conventional methods result in a lower accuracy than models trained with plain images due to the influence of image encryption. To overcome these issues, a novel method for privacy-preserving semantic segmentation is proposed by using an embedding that the ViT structure has for the first time. In experiments, the proposed privacy-preserving semantic segmentation was demonstrated to have the same accuracy as that of using plain images under the use of encrypted images. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Temporal Context Modeling Network with Local-Global Complementary Architecture for Temporal Proposal Generation
Electronics 2022, 11(17), 2674; https://doi.org/10.3390/electronics11172674 - 26 Aug 2022
Cited by 1 | Viewed by 864
Abstract
Temporal Action Proposal Generation (TAPG) is a promising but challenging task with a wide range of practical applications. Although state-of-the-art methods have made significant progress in TAPG, most ignore the impact of the temporal scales of action and lack the exploitation of effective [...] Read more.
Temporal Action Proposal Generation (TAPG) is a promising but challenging task with a wide range of practical applications. Although state-of-the-art methods have made significant progress in TAPG, most ignore the impact of the temporal scales of action and lack the exploitation of effective boundary contexts. In this paper, we propose a simple but effective unified framework named Temporal Context Modeling Network (TCMNet) that generates temporal action proposals. TCMNet innovatively uses convolutional filters with different dilation rates to address the temporal scale issue. Specifically, TCMNet contains a BaseNet with dilated convolutions (DBNet), an Action Completeness Module (ACM), and a Temporal Boundary Generator (TBG). The DBNet aims to model temporal information. It handles input video features through different dilated convolutional layers and outputs a feature sequence as the input of ACM and TBG. The ACM aims to evaluate the confidence scores of densely distributed proposals. The TBG is designed to enrich the boundary context of an action instance. The TBG can generate action boundaries with high precision and high recall through a local–global complementary structure. We conduct comprehensive evaluations on two challenging video benchmarks: ActivityNet-1.3 and THUMOS14. Extensive experiments demonstrate the effectiveness of the proposed TCMNet on tasks of temporal action proposal generation and temporal action detection. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Color Point Defect Detection Method Based on Color Salient Features
Electronics 2022, 11(17), 2665; https://doi.org/10.3390/electronics11172665 - 25 Aug 2022
Viewed by 1000
Abstract
Display color point defect detection is an important link in the display quality inspection process. To improve the detection accuracy of color point defects, a color point defect detection method based on color salient features is proposed. Color point defects that conform to [...] Read more.
Display color point defect detection is an important link in the display quality inspection process. To improve the detection accuracy of color point defects, a color point defect detection method based on color salient features is proposed. Color point defects that conform to the perception of the human vision are used as the key point for detection. First, the human visual perception constraint coefficient is used to correct the RGB three-channel image to obtain the color-channel-transformed image. Then, the local contrast method is used to extract the point features of the color channel, which achieves point defect enhancement, noise and background suppression. Finally, the mean and standard deviation of the defect feature maps of R, G, and B channels are calculated. The maximum mean and standard deviation are selected as thresholds using the maximum fusion criterion to perform binarization segmentation of the defect feature maps of R, G, and B channels. An OR operation was performed on the segmented images and the point defect segmentation results were combined. The experimental results show that the average detection accuracy and recall of the algorithm is higher than 94%, which is a significant improvement compared with mainstream detection methods and meets the needs of industrial production. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Multiple Mechanisms to Strengthen the Ability of YOLOv5s for Real-Time Identification of Vehicle Type
Electronics 2022, 11(16), 2586; https://doi.org/10.3390/electronics11162586 - 18 Aug 2022
Cited by 4 | Viewed by 1046
Abstract
Identifying the type of vehicle on the road is a challenging task, especially in the natural environment with all its complexities, such that the traditional architecture for object detection requires an excessively large amount of computation. Such lightweight networks as MobileNet are fast [...] Read more.
Identifying the type of vehicle on the road is a challenging task, especially in the natural environment with all its complexities, such that the traditional architecture for object detection requires an excessively large amount of computation. Such lightweight networks as MobileNet are fast but cannot satisfy the performance-related requirements of this task. Improving the detection-related performance of small networks is, thus, an outstanding challenge. In this paper, we use YOLOv5s as the backbone network to propose a large-scale convolutional fusion module called the ghost cross-stage partial network (G_CSP), which can integrate large-scale information from different feature maps to identify vehicles on the road. We use the convolutional triplet attention network (C_TA) module to extract attention-based information from different dimensions. We also optimize the original spatial pyramid pooling fast (SPPF) module and use the dilated convolution to increase the capability of the network to extract information. The optimized module is called the DSPPF. The results of extensive experiments on the bdd100K, VOC2012 + 2007, and VOC2019 datasets showed that the improved YOLOv5s network performs well and can be used on mobile devices in real time. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Multi-Domain Embedding Framework for Robust Reversible Data Hiding Scheme in Encrypted Videos
Electronics 2022, 11(16), 2552; https://doi.org/10.3390/electronics11162552 - 15 Aug 2022
Cited by 1 | Viewed by 893
Abstract
For easier cloud management, reversible data hiding is performed in an encrypted domain to embed label information. However, the existing schemes are not robust and may cause the loss of label information during transmission. Enhancing robustness while maintaining reversibility in data hiding is [...] Read more.
For easier cloud management, reversible data hiding is performed in an encrypted domain to embed label information. However, the existing schemes are not robust and may cause the loss of label information during transmission. Enhancing robustness while maintaining reversibility in data hiding is a challenge. In this paper, a multi-domain embedding framework in encrypted videos is proposed to achieve both robustness and reversibility. In the framework, the multi-domain characteristic of encrypted video is fully used. The element for robust embedding is encrypted through Logistic chaotic scrambling, which is marked as element-I. To further improve robustness, the label information will be encoded with the Bose–Chaudhuri–Hocquenghem code. Then, the label information will be robustly embedded into element-I by modulating the amplitude of element-I, in which the auxiliary information is generated for lossless recovery of the element-I. The element for reversible embedding is marked as element-II, the sign of which will be encrypted by stream cipher. The auxiliary information will be reversibly embedded into element-Ⅱ through traditional histogram shifting. To verity the feasibility of the framework, an anti-recompression RDH-EV based on the framework is proposed. The experimental results show that the proposed scheme outperforms the current representative ones in terms of robustness, while achieving reversibility. In the proposed scheme, video encryption and data hiding are commutative and the original video bitstream can be recovered fully. These demonstrate the feasibility of the multi-domain embedding framework in encrypted videos. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Small Sample Hyperspectral Image Classification Method Based on Dual-Channel Spectral Enhancement Network
Electronics 2022, 11(16), 2540; https://doi.org/10.3390/electronics11162540 - 13 Aug 2022
Cited by 2 | Viewed by 1294
Abstract
Deep learning has achieved significant success in the field of hyperspectral image (HSI) classification, but challenges are still faced when the number of training samples is small. Feature fusing approaches based on multi-channel and multi-scale feature extractions are attractive for HSI classification where [...] Read more.
Deep learning has achieved significant success in the field of hyperspectral image (HSI) classification, but challenges are still faced when the number of training samples is small. Feature fusing approaches based on multi-channel and multi-scale feature extractions are attractive for HSI classification where few samples are available. In this paper, based on feature fusion, we proposed a simple yet effective CNN-based Dual-channel Spectral Enhancement Network (DSEN) to fully exploit the features of the small labeled HSI samples for HSI classification. We worked with the observation that, in many HSI classification models, most of the incorrectly classified pixels of HSI are at the border of different classes, which is caused by feature obfuscation. Hence, in DSEN, we specially designed a spectral feature extraction channel to enhance the spectral feature representation of the specific pixel. Moreover, a spatial–spectral channel was designed using small convolution kernels to extract the spatial–spectral features of HSI. By adjusting the fusion proportion of the features extracted from the two channels, the expression of spectral features was enhanced in terms of the fused features for better HSI classification. The experimental results demonstrated that the overall accuracy (OA) of HSI classification using the proposed DSEN reached 69.47%, 80.54%, and 93.24% when only five training samples for each class were selected from the Indian Pines (IP), University of Pavia (UP), and Salinas Scene (SA) datasets, respectively. The performance improved when the number of training samples increased. Compared with several related methods, DSEN demonstrated superior performance in HSI classification. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1