Convolutional Neural Networks and Vision Applications - Volume III

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 19 September 2024 | Viewed by 16565

Special Issue Editors

School of Electronics and Information Technology, Sun Yat-Sen University, Guangzhou 510006, China
Interests: computer vision and pattern recognition
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Processing speed is critical for visual inspection automation and mobile visual computing applications. Many powerful and sophisticated computer vision algorithms generate accurate results but require high computational power or resources and are not entirely suitable for real-time vision applications. On the other hand, there are vision algorithms and convolutional neural networks that perform at camera frame rates but with moderately reduced accuracy, which is arguably more applicable for real-time vision applications. This Special Issue is for research related to the design, optimization, and implementation of machine-learning-based vision algorithms or convolutional neural networks that are suitable for real-time vision applications.

General topics covered in this Special Issue include but are not limited to:

  • Optimization of software-based vision algorithms;
  • CNN architecture optimizations for real-time performance;
  • CNN acceleration through approximate computing;
  • CNN applications that require real-time performance;
  • Tradeoff analysis between speed and accuracy in CNN;
  • GPU-based implementations for real-time CNN performance;
  • FPGA-based implementations for real-time CNN performance;
  • Embedded vision systems for applications that require real-time performance;
  • Machine vision applications that require real-time performance.

Prof. Dr. D. J. Lee
Dr. Dong Zhang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • convolutional neural networks
  • vision applications

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

18 pages, 31918 KiB  
Article
Deep Transfer Learning for Image Classification of Phosphorus Nutrition States in Individual Maize Leaves
by Manuela Ramos-Ospina, Luis Gomez, Carlos Trujillo and Alejandro Marulanda-Tobón
Electronics 2024, 13(1), 16; https://doi.org/10.3390/electronics13010016 - 19 Dec 2023
Viewed by 778
Abstract
Computer vision is a powerful technology that has enabled solutions in various fields by analyzing visual attributes of images. One field that has taken advantage of computer vision is agricultural automation, which promotes high-quality crop production. The nutritional status of a crop is [...] Read more.
Computer vision is a powerful technology that has enabled solutions in various fields by analyzing visual attributes of images. One field that has taken advantage of computer vision is agricultural automation, which promotes high-quality crop production. The nutritional status of a crop is a crucial factor for determining its productivity. This status is mediated by approximately 14 chemical elements acquired by the plant, and their determination plays a pivotal role in farm management. To address the timely identification of nutritional disorders, this study focuses on the classification of three levels of phosphorus deficiencies through individual leaf analysis. The methodological steps include: (1) using different capture devices to generate a database of images composed of laboratory-grown maize plants that were induced to either total phosphorus deficiency, medium deficiency, or total nutrition; (2) processing the images with state-of-the-art transfer learning architectures (i.e., VGG16, ResNet50, GoogLeNet, DenseNet201, and MobileNetV2); and (3) evaluating the classification performance of the models using the created database. The results show that the DenseNet201 model achieves superior performance, with 96% classification accuracy. However, the other studied architectures also demonstrate competitive performance and are considered state-of-the-art automatic leaf nutrition deficiency detection tools. The proposed method can be a starting point to fine-tune machine-vision-based solutions tailored for real-time monitoring of crop nutritional status. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications - Volume III)
Show Figures

Figure 1

24 pages, 8317 KiB  
Article
Rulers2023: An Annotated Dataset of Synthetic and Real Images for Ruler Detection Using Deep Learning
by Dalius Matuzevičius
Electronics 2023, 12(24), 4924; https://doi.org/10.3390/electronics12244924 - 07 Dec 2023
Viewed by 1129
Abstract
This research investigates the usefulness and efficacy of synthetic ruler images for the development of a deep learning-based ruler detection algorithm. Synthetic images offer a compelling alternative to real-world images as data sources in the development and advancement of computer vision systems. This [...] Read more.
This research investigates the usefulness and efficacy of synthetic ruler images for the development of a deep learning-based ruler detection algorithm. Synthetic images offer a compelling alternative to real-world images as data sources in the development and advancement of computer vision systems. This research aims to answer whether using a synthetic dataset of ruler images is sufficient for training an effective ruler detector and to what extent such a detector could benefit from including synthetic images as a data source. The article presents the procedural method for generating synthetic ruler images, describes the methodology for evaluating the synthetic dataset using trained convolutional neural network (CNN)-based ruler detectors, and shares the compiled synthetic and real ruler image datasets. It was found that the synthetic dataset yielded superior results in training the ruler detectors compared with the real image dataset. The results support the utility of synthetic datasets as a viable and advantageous approach to training deep learning models, especially when real-world data collection presents significant logistical challenges. The evidence presented here strongly supports the idea that when carefully generated and used, synthetic data can effectively replace real images in the development of CNN-based detection systems. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications - Volume III)
Show Figures

Graphical abstract

20 pages, 13132 KiB  
Article
FPGA-Based CNN for Eye Detection in an Iris Recognition at a Distance System
by Camilo A. Ruiz-Beltrán, Adrián Romero-Garcés, Martín González-García, Rebeca Marfil and Antonio Bandera
Electronics 2023, 12(22), 4713; https://doi.org/10.3390/electronics12224713 - 20 Nov 2023
Cited by 1 | Viewed by 1219
Abstract
Neural networks are the state-of-the-art solution to image-processing tasks. Some of these neural networks are relatively simple, but the popular convolutional neural networks (CNNs) can consist of hundreds of layers. Unfortunately, the excellent recognition accuracy of CNNs comes at the cost of very [...] Read more.
Neural networks are the state-of-the-art solution to image-processing tasks. Some of these neural networks are relatively simple, but the popular convolutional neural networks (CNNs) can consist of hundreds of layers. Unfortunately, the excellent recognition accuracy of CNNs comes at the cost of very high computational complexity, and one of the current challenges is managing the power, delay and physical size limitations of hardware solutions dedicated to accelerating their inference process. In this paper, we describe the embedding of an eye detection system on a Zynq XCZU4EV UltraScale+ multiprocessor system-on-chip (MPSoC). This eye detector is used in the application framework of a remote iris recognition system, which requires high resolution images captured at high speed as input. Given the high rate of eye regions detected per second, it is also important that the detector only provides as output images eyes that are in focus, discarding all those seriously affected by defocus blur. In this proposal, the network will be trained only with correctly focused eye images to assess whether it can differentiate this pattern from that associated with the out-of-focus eye image. Exploiting the neural network’s advantage of being able to work with multi-channel input, the inputs to the CNN will be the grey level image and a high-pass filtered version, typically used to determine whether the iris is in focus or not. The complete system synthetises other cores and implements CNN using the so-called Deep Learning Processor Unit (DPU), the intellectual property (IP) block released by AMD/Xilinx. Compared to previous hardware designs for implementing FPGA-based CNNs, the DPU IP supports extensive deep learning core functions, and developers can leverage DPUs to conveniently accelerate CNN inference. Experimental validation has been successfully addressed in a real-world scenario working with walking subjects, demonstrating that it is possible to detect only eye images that are in focus. This prototype module includes a CMOS digital image sensor that provides 16 Mpixel images, and outputs a stream of detected eyes as 640 × 480 images. The module correctly discards up to 95% of the eyes present in the input images as not being correctly focused. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications - Volume III)
Show Figures

Figure 1

16 pages, 11919 KiB  
Article
Real-Time Defect Detection Model in Industrial Environment Based on Lightweight Deep Learning Network
by Jiaqi Lu and Soo-Hong Lee
Electronics 2023, 12(21), 4388; https://doi.org/10.3390/electronics12214388 - 24 Oct 2023
Cited by 1 | Viewed by 1204
Abstract
Surface defect detection in industrial environments is crucial for quality management and has significant research value. General detection networks, such as the YOLO series, have proven effective in various dataset detections. However, due to the complex and varied surface defects of industrial products, [...] Read more.
Surface defect detection in industrial environments is crucial for quality management and has significant research value. General detection networks, such as the YOLO series, have proven effective in various dataset detections. However, due to the complex and varied surface defects of industrial products, many defects occupy a small proportion of the surface and fall into the category of typical small target detection problems. Moreover, the complexity of general detection network architectures relies on high-tech hardware, making it difficult to deploy on devices without GPUs or on edge computing and mobile devices. To meet the practical needs of industrial product defect inspection applications, this paper proposes a lightweight network specifically designed for defect detection in industrial fields. This network is composed of four parts: a backbone network, a multiscale feature aggregation network, a residual enhancement network, and an attention enhancement network. The network includes a backbone network that integrates attention layers for feature extraction, a multiscale feature aggregation network for semantic information, a residual enhancement network for spatial focus, and an attention enhancement network for global–local feature interaction. These components enhance detection performance for diverse defects while maintaining low hardware requirements. Experimental results show that this network outperforms the latest and most popular YOLOv5n and YOLOv8n models in the five indicators P, R, F1, [email protected], and GFLOPS when used on four public datasets. It even approaches or surpasses the YOLOv8s and YOLOv5s models with several times the GFLOPS computation. It balances the requirements of lightweight real-time and accuracy in the scenario of industrial product surface defect detection. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications - Volume III)
Show Figures

Figure 1

17 pages, 6034 KiB  
Article
A Fast Adaptive Binarization Method for QR Code Images Based on Dynamic Illumination Equalization
by Rongjun Chen, Yue Huang, Kailin Lan, Jiawen Li, Yongqi Ren, Xianglei Hu, Leijun Wang, Huimin Zhao and Xu Lu
Electronics 2023, 12(19), 4134; https://doi.org/10.3390/electronics12194134 - 04 Oct 2023
Viewed by 1077
Abstract
The advancement of Internet of Things (IoT) has enhanced the extensive usage of QR code images in various computer vision applications. Nonetheless, this has also brought forth several technical challenges. In particular, the logistics sorting system often encounters issues such as a low [...] Read more.
The advancement of Internet of Things (IoT) has enhanced the extensive usage of QR code images in various computer vision applications. Nonetheless, this has also brought forth several technical challenges. In particular, the logistics sorting system often encounters issues such as a low recognition rate and slow processing speed when dealing with QR code images under complex lighting conditions like uneven illumination. To address these difficulties, a method that focuses on achieving a fast adaptive binarization of QR code images through dynamic illumination equalization was proposed. First, an algorithm based on edge enhancement to obtain the position detection patterns within QR code images was applied, which enabled the acquisition of structural features in uneven illumination. Subsequently, QR code images with complex lighting conditions can achieve a fast adaptive binarization through dynamic illumination equalization. As for method validation, the experiments were performed on the two datasets that include QR code images influenced by strong light, weak light, and different shadow degrees. The results disclosed the benefits of the proposed method compared to the previous approaches; it produced superior recognition rates of 78.26–98.75% in various cases through commonly used decoders (Wechat and Zxing), with a faster processing speed of 0.0164 s/image, making it a proper method to satisfy real-time requirements in practical applications, such as a logistics sorting system. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications - Volume III)
Show Figures

Figure 1

26 pages, 818 KiB  
Article
Enhancing CNNs Performance on Object Recognition Tasks with Gabor Initialization
by Pablo Rivas and Mehang Rai
Electronics 2023, 12(19), 4072; https://doi.org/10.3390/electronics12194072 - 28 Sep 2023
Viewed by 760
Abstract
The use of Gabor filters in image processing has been well-established, and these filters are recognized for their exceptional feature extraction capabilities. These filters are usually applied through convolution. While convolutional neural networks (CNNs) are designed to learn optimal filters, little research exists [...] Read more.
The use of Gabor filters in image processing has been well-established, and these filters are recognized for their exceptional feature extraction capabilities. These filters are usually applied through convolution. While convolutional neural networks (CNNs) are designed to learn optimal filters, little research exists regarding any advantages of initializing CNNs with Gabor filters. In this study, the performance of CNNs initialized with Gabor filters is compared to traditional CNNs with random initialization on six object recognition datasets. The results indicated that the Gabor-initialized CNNs outperformed the traditional CNNs in terms of accuracy, area under the curve, minimum loss, and convergence speed. A statistical analysis was performed to validate the performance of the classifiers, and the results showed that the Gabor classifiers outperformed the baseline classifiers. The findings of this study provide robust evidence in favor of using Gabor-based methods for initializing the receptive fields of CNN architectures. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications - Volume III)
Show Figures

Figure 1

18 pages, 6334 KiB  
Article
XSC—An eXplainable Image Segmentation and Classification Framework: A Case Study on Skin Cancer
by Emmanuel Pintelas and Ioannis E. Livieris
Electronics 2023, 12(17), 3551; https://doi.org/10.3390/electronics12173551 - 22 Aug 2023
Cited by 1 | Viewed by 1101
Abstract
Within the field of computer vision, image segmentation and classification serve as crucial tasks, involving the automatic categorization of images into predefined groups or classes, respectively. In this work, we propose a framework designed for simultaneously addressing segmentation and classification tasks in image-processing [...] Read more.
Within the field of computer vision, image segmentation and classification serve as crucial tasks, involving the automatic categorization of images into predefined groups or classes, respectively. In this work, we propose a framework designed for simultaneously addressing segmentation and classification tasks in image-processing contexts. The proposed framework is composed of three main modules and focuses on providing transparency, interpretability, and explainability in its operations. The first two modules are used to partition the input image into regions of interest, allowing the automatic and interpretable identification of segmentation regions using clustering techniques. These segmentation regions are then analyzed to select those considered valuable by the user for addressing the classification task. The third module focuses on classification, using an explainable classifier, which relies on hand-crafted transparent features extracted from the selected segmentation regions. By leveraging only the selected informative regions, the classification model is made more reliable and less susceptible to misleading information. The proposed framework’s effectiveness was evaluated in a case study on skin-cancer-segmentation and -classification benchmarks. The experimental analysis highlighted that the proposed framework exhibited comparable performance with the state-of-the-art deep-learning approaches, which implies its efficiency, considering the fact that the proposed approach is also interpretable and explainable. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications - Volume III)
Show Figures

Figure 1

20 pages, 150759 KiB  
Article
Using Haze Level Estimation in Data Cleaning for Supervised Deep Image Dehazing Models
by Cheng-Hsiung Hsieh and Ze-Yu Chen
Electronics 2023, 12(16), 3485; https://doi.org/10.3390/electronics12163485 - 17 Aug 2023
Viewed by 584
Abstract
Recently, supervised deep learning methods have been widely used for image haze removal. These methods rely on training data that are assumed to be appropriate. However, this assumption may not always be true. We observe that some data may contain hazy ground truth [...] Read more.
Recently, supervised deep learning methods have been widely used for image haze removal. These methods rely on training data that are assumed to be appropriate. However, this assumption may not always be true. We observe that some data may contain hazy ground truth (GT) images. This can lead to supervised deep image dehazing (SDID) models learning inappropriate mapping between hazy images and GT images, which negatively affects the dehazing performance. To address this problem, two difficulties must be solved. One is to estimate the haze level in an image, and the other is to develop a haze level indicator to discriminate clear and hazy images. To this end, we proposed a haze level estimation (HLE) scheme based on dark channel prior and a haze level indicator accordingly for training data cleaning, i.e., to exclude image pairs with hazy GT images in the data set. With the data cleaning by the HLE, we introduced an SDID framework to avoid inappropriate learning and thus improve the dehazing performance. To verify the framework, using the RESIDE data set, experiments were conducted with three types of SDID models, i.e., GCAN, REFN and cGAN. The results show that our method can significantly improve the dehazing performance of the three SDID models. Subjectively, the proposed method generally provides better visual quality. Objectively, our method, using fewer training image pairs, was capable of improving PSNR in the GCAN, REFN, and cGAN models by 3.10 dB, 5.74 dB, and 6.44 dB, respectively. Furthermore, our method was evaluated using a real-world data set, KeDeMa. The results indicate that the better visual quality of the dehazed images is generally for models with the proposed data cleaning scheme. The results demonstrate that the proposed method effectively and efficiently enhances the dehazing performance in the given examples. The practical significance of this research is to provide an easy but effective way, that is, the proposed data cleaning scheme, to improve the performance of SDID models. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications - Volume III)
Show Figures

Figure 1

21 pages, 6058 KiB  
Article
A Fine-Tuned Hybrid Stacked CNN to Improve Bengali Handwritten Digit Recognition
by Ruhul Amin, Md. Shamim Reza, Yuichi Okuyama, Yoichi Tomioka and Jungpil Shin
Electronics 2023, 12(15), 3337; https://doi.org/10.3390/electronics12153337 - 04 Aug 2023
Cited by 1 | Viewed by 1112
Abstract
Recognition of Bengali handwritten digits has several unique challenges, including the variation in writing styles, the different shapes and sizes of digits, the varying levels of noise, and the distortion in the images. Despite significant improvements, there is still room for further improvement [...] Read more.
Recognition of Bengali handwritten digits has several unique challenges, including the variation in writing styles, the different shapes and sizes of digits, the varying levels of noise, and the distortion in the images. Despite significant improvements, there is still room for further improvement in the recognition rate. By building datasets and developing models, researchers can advance state-of-the-art support, which can have important implications for various domains. In this paper, we introduce a new dataset of 5440 handwritten Bengali digit images acquired from a Bangladeshi University that is now publicly available. Both conventional machine learning and CNN models were used to evaluate the task. To begin, we scrutinized the results of the ML model used after integrating three image feature descriptors, namely Binary Pattern (LBP), Complete Local Binary Pattern (CLBP), and Histogram of Oriented Gradients (HOG), using principal component analysis (PCA), which explained 95% of the variation in these descriptors. Then, via a fine-tuning approach, we designed three customized CNN models and their stack to recognize Bengali handwritten digits. On handcrafted image features, the XGBoost classifier achieved the best accuracy at 85.29%, an ROC AUC score of 98.67%, and precision, recall, and F1 scores ranging from 85.08% to 85.18%, indicating that there was still room for improvement. On our own data, the proposed customized CNN models and their stack model surpassed all other models, reaching a 99.66% training accuracy and a 97.57% testing accuracy. In addition, to robustify our proposed CNN model, we used another dataset of Bengali handwritten digits obtained from the Kaggle repository. Our stack CNN model provided remarkable performance. It obtained a training accuracy of 99.26% and an almost equally remarkable testing accuracy of 96.14%. Without any rigorous image preprocessing, fewer epochs, and less computation time, our proposed CNN model performed the best and proved the most resilient throughout all of the datasets, which solidified its position at the forefront of the field. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications - Volume III)
Show Figures

Figure 1

14 pages, 3464 KiB  
Article
Image Data Extraction and Driving Behavior Analysis Based on Geographic Information and Driving Data
by Huei-Yung Lin, Jun-Zhi Zhang and Chin-Chen Chang
Electronics 2023, 12(13), 2989; https://doi.org/10.3390/electronics12132989 - 07 Jul 2023
Viewed by 980
Abstract
Driving behavior analysis has become crucial for traffic safety. In addition, more abundant driving data are needed to analyze driving behavior more comprehensively and thus improve traffic safety. This paper proposes an approach to image data extraction and driving behavior analysis that uses [...] Read more.
Driving behavior analysis has become crucial for traffic safety. In addition, more abundant driving data are needed to analyze driving behavior more comprehensively and thus improve traffic safety. This paper proposes an approach to image data extraction and driving behavior analysis that uses geographic information and driving data. Information derived from geographic and global positioning systems was used for image data extraction. In addition, we used an onboard diagnostic II and a controller area network bus logger to record driving data for driving behavior analysis. Driving behavior was analyzed using sparse automatic encoders and data exploration to detect abnormal and aggressive behavior. A regression analysis was performed to derive the relationship between aggressive driving behavior and road facilities. The results indicated that lane ratios, no lane markings, and straight lane markings are important features that affect aggressive driving behaviors. Several traffic improvements were proposed for specific intersections and roads to make drivers and pedestrians safer. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications - Volume III)
Show Figures

Figure 1

22 pages, 4399 KiB  
Article
Attention Mechanisms in Convolutional Neural Networks for Nitrogen Treatment Detection in Tomato Leaves Using Hyperspectral Images
by Brahim Benmouna, Raziyeh Pourdarbani, Sajad Sabzi, Ruben Fernandez-Beltran, Ginés García-Mateos and José Miguel Molina-Martínez
Electronics 2023, 12(12), 2706; https://doi.org/10.3390/electronics12122706 - 16 Jun 2023
Cited by 2 | Viewed by 1127
Abstract
Nitrogen is an essential macronutrient for the growth and development of tomatoes. However, excess nitrogen fertilization can affect the quality of tomato fruit, making it unattractive to consumers. Consequently, the aim of this study is to develop a method for the early detection [...] Read more.
Nitrogen is an essential macronutrient for the growth and development of tomatoes. However, excess nitrogen fertilization can affect the quality of tomato fruit, making it unattractive to consumers. Consequently, the aim of this study is to develop a method for the early detection of excessive nitrogen fertilizer use in Royal tomato by visible and near-infrared spectroscopy. Spectral reflectance values of tomato leaves were captured at wavelengths between 400 and 1100 nm, collected from several treatments after application of normal nitrogen and on the first, second, and third days after application of excess nitrogen. A new method based on convolutional neural networks (CNN) with an attention mechanism was proposed to perform the estimation of nitrogen overdose in tomato leaves. To verify the effectiveness of this method, the proposed attention mechanism-based CNN classifier was compared with an alternative CNN having the same architecture without integrating the attention mechanism, and with other CNN models, AlexNet and VGGNet. Experimental results showed that the CNN with an attention mechanism outperformed the alternative CNN, achieving a correct classification rate (CCR) of 97.33% for the treatment, compared with a CCR of 94.94% for the CNN alone. These findings will help in the development of a new tool for rapid and accurate detection of nitrogen fertilizer overuse in large areas. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications - Volume III)
Show Figures

Figure 1

Review

Jump to: Research

28 pages, 453 KiB  
Review
Data-Driven Advancements in Lip Motion Analysis: A Review
by Shad Torrie, Andrew Sumsion, Dah-Jye Lee and Zheng Sun
Electronics 2023, 12(22), 4698; https://doi.org/10.3390/electronics12224698 - 18 Nov 2023
Viewed by 1123
Abstract
This work reviews the dataset-driven advancements that have occurred in the area of lip motion analysis, particularly visual lip-reading and visual lip motion authentication, in the deep learning era. We provide an analysis of datasets and their usage, creation, and associated challenges. Future [...] Read more.
This work reviews the dataset-driven advancements that have occurred in the area of lip motion analysis, particularly visual lip-reading and visual lip motion authentication, in the deep learning era. We provide an analysis of datasets and their usage, creation, and associated challenges. Future research can utilize this work as a guide for selecting appropriate datasets and as a source of insights for creating new and innovative datasets. Large and varied datasets are vital to a successful deep learning system. There have been many incredible advancements made in these fields due to larger datasets. There are indications that even larger, more varied datasets would result in further improvement upon existing systems. We highlight the datasets that brought about the progression in lip-reading systems from digit- to word-level lip-reading, and then from word- to sentence-level lip-reading. Through an in-depth analysis of lip-reading system results, we show that datasets with large amounts of diversity increase results immensely. We then discuss the next step for lip-reading systems to move from sentence- to dialogue-level lip-reading and emphasize that new datasets are required to make this transition possible. We then explore lip motion authentication datasets. While lip motion authentication has been well researched, it is not very unified on a particular implementation, and there is no benchmark dataset to compare the various methods. As was seen in the lip-reading analysis, large, diverse datasets are required to evaluate the robustness and accuracy of new methods attempted by researchers. These large datasets have pushed the work in the visual lip-reading realm. Due to the lack of large, diverse, and publicly accessible datasets, visual lip motion authentication research has struggled to validate results and real-world applications. A new benchmark dataset is required to unify the studies in this area such that they can be compared to previous methods as well as validate new methods more effectively. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications - Volume III)
Show Figures

Figure 1

33 pages, 1227 KiB  
Review
A Systematic Literature Review on Artificial Intelligence and Explainable Artificial Intelligence for Visual Quality Assurance in Manufacturing
by Rudolf Hoffmann and Christoph Reich
Electronics 2023, 12(22), 4572; https://doi.org/10.3390/electronics12224572 - 08 Nov 2023
Viewed by 3101
Abstract
Quality assurance (QA) plays a crucial role in manufacturing to ensure that products meet their specifications. However, manual QA processes are costly and time-consuming, thereby making artificial intelligence (AI) an attractive solution for automation and expert support. In particular, convolutional neural networks (CNNs) [...] Read more.
Quality assurance (QA) plays a crucial role in manufacturing to ensure that products meet their specifications. However, manual QA processes are costly and time-consuming, thereby making artificial intelligence (AI) an attractive solution for automation and expert support. In particular, convolutional neural networks (CNNs) have gained a lot of interest in visual inspection. Next to AI methods, the explainable artificial intelligence (XAI) systems, which achieve transparency and interpretability by providing insights into the decision-making process of the AI, are interesting methods for achieveing quality inspections in manufacturing processes. In this study, we conducted a systematic literature review (SLR) to explore AI and XAI approaches for visual QA (VQA) in manufacturing. Our objective was to assess the current state of the art and identify research gaps in this context. Our findings revealed that AI-based systems predominantly focused on visual quality control (VQC) for defect detection. Research addressing VQA practices, like process optimization, predictive maintenance, or root cause analysis, are more rare. Least often cited are papers that utilize XAI methods. In conclusion, this survey emphasizes the importance and potential of AI and XAI in VQA across various industries. By integrating XAI, organizations can enhance model transparency, interpretability, and trust in AI systems. Overall, leveraging AI and XAI improves VQA practices and decision-making in industries. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications - Volume III)
Show Figures

Figure 1

Back to TopTop