2.1. Tampon Inspection Unit
An automated inspection system for half-finished tampons with absorbent core and cotton cord without applicators was proposed in 2004 [
10]. The proposed inspection system consisted of a visual inspection system equipped with three cameras mounted at angles of 120°, as shown in
Figure 3.
The main reason for applying three vision cameras was the transport system that handled the tampons with wire-hooks made of spring steel, as shown in
Figure 4.
The hook-based transport system guaranteed stability for image processing by positioning the tampons at the correct distance from the cameras. However, since one camera can observe only a third of the body, three cameras are required for successful 360° inspection, which leads to higher inspection costs. Furthermore, the hook-based transport was designed based on the material characteristics of the cotton absorbent core, hence it cannot be applied for slippery plastic applicators.
For the inspection system, traditional rule-based vision inspection techniques were used. Tatzer et al. [
10] made the best use of the heuristic assumption for object detection. They assumed that the object is the tampon if the length and width of the white pixels are within the measurement tolerances because the tampon is light, and the background is dark. For inspecting the tampon body, they developed an algorithm based on the automatic white balance correction. The algorithm assumes that a significant part of the detected object is white, and that only a small part should be colored. Given this characteristic, the white balance correction values were calculated. The proposed rule-based algorithm using the heuristic assumption should work robustly for absorbent-core inspection. However, the automatic white balance correction method is hard to implement for plastic products that reflect light. In addition, it does not guarantee the accuracy for various atypical types of defects caused by any working environment or reference figures that change because engineers extract the features manually.
2.2. AI Technologies Applied for Industrial Applications
The inspection process using machine vision is divided into (1) image acquisition, (2) digitalization, (3) processing, (4) analysis, and (5) interpretation. For the traditional visual inspection system, experts teach the features and the inspection rules manually. Studies for surface-defect detection based on traditional machine vision have been conducted actively. The textural defect detection methods can be categorized into (1) statistical, (2) structural, (3) filter-based, and (4) model-based [
11]. The statistical and filter-based methods are commonly used in industrial applications [
5]. However, they have limitations in that they cannot generate discriminative features [
12], and often must be applied as an ensemble in order to work well [
13,
14,
15]. For this reason, the model-based defect detection method has been actively researched, and has become more popular thanks to the development of high-performance computers and Artificial Intelligence (AI).
A Convolutional Neural Network (CNN) is based on the structure of ImageNet developed by LeCun [
16] in 1989. In this research, LeCun proposed data augmentation and dropout to solve the overfitting problems. Since AlexNet was presented in 2012, which dominated other algorithms in image classification [
17], CNN has gained tremendous popularity in academia. Using locality, shared weight, and multiple layers with pooling operations, CNN can automatically learn features from the data and classify images without separate manual feature-extraction tasks. Given this advantage, many researchers have applied CNN in quality inspection to overcome the limitations of the traditional visual inspection system. Wen et al. [
12] used CNN for inspecting wafer semiconductors, and Yang et al. [
18] suggested a USB defect inspection system based on CNN. Zhong et al. [
19] proposed a CNN-based algorithm for detecting catenary split pins in high-speed railways.
CNN can classify objects well but has trouble in locating objects. R-CNN is a model with the addition of a box-offset regressor that improves the accuracy and localization [
20]. Fast R-CNN is a model derived from R-CNN that applies Region of Interest (RoI) pooling and softmax classification to compensate for R-CNN’s shortcomings. Fast R-CNN enables Single Stage Training and Backpropagation to improve object-detection performance [
21]. Faster R-CNN uses a Region Proposal Network (RPN) to diversify the anchor-box size by using a feature map to detect objects independently. Furthermore, Faster R-CNN leverages a Feature Pyramid Network (FPN) to detect small objects by rescaling feature maps to different sizes. The algorithm proposed by Ren et al. [
22] has been tested on COCO and PASCAL VOC datasets and requires complements to be used for manufacturing inspection. The Mask R-CNN proposed by He et al. [
23] added a Fully Convolutional Network (FCN) to Faster R-CNN, offsetting the shortcomings of having location information disappear from the Fully Connected layers (FC). Mask R-CNN also applies RoI-alignment to improve the problem with the discrepancy that Faster R-CNN with RoI pooling forces objects of different sizes to specific values. The mask prediction and class prediction work separately.
The object-detection algorithm is actively being studied for inspecting defects. Faster R-CNN is a model with high accuracy among object-detection algorithms and is being studied in the industrial field. For example, Liyun et al. [
24] conducted a study to detect defects in automobile engine blocks and heads using Faster R-CNN and detected surface defects with a 92.15% probability. Wang et al. [
25] also tested 175 defects in turbo blades entering automobile engines by using Faster R-CNN. Oh et al. [
26] used Faster R-CNN to detect welding defects automatically. On the other hand, Mask R-CNN is also actively researched in the fields that require object segmentation. Attard et al. [
27] studied a method for automatically inspecting cracks in buildings using Mask R-CNN. Zhao et al. [
28] conducted a study to examine the cable bracket of an aircraft using Mask R-CNN.
The above-mentioned R-CNN-based models extract sub-images by generating thousands of regions by means of selective search algorithms. Then, the features are extracted while these sub-images pass through CNN. This complex image processing method may guarantee accuracy, but it is slow. To overcome the shortcoming in speed, Redmon et al. [
29] presented a new object-detection algorithm called YOLO, which is a one-stage detector that executes regional proposal and classification at once, which means that the single neural network predicts bounding boxes and class probabilities directly from the image. YOLO uses non-max suppression to decide regions. The architecture of YOLO is shown in
Figure 5.
Since YOLO was presented in 2016, it has been improved continuously. In 2017, YOLOv2 was introduced by Redmon et al. [
30] with image classification significantly improved by applying batch normalization, direct location prediction, and multiscale training. As a result, YOLOv2 can detect more than 9000 object categories, which exceeds YOLO with the capability of detecting 20 objects. In 2018, Redmon et al. [
31] improved the algorithm by increasing the network up to 106 layers. Bochkovskiy et al. [
32] presented YOLOv4, which was improved in accuracy by increasing the training and inference cost with the help of the Bag-of-Freebies and Bag-of-Specials methods of object detection during the detector training. As a result, YOLOv4 outperforms YOLOv3 by approximately 10%, and allows one to train, test, and deploy the model with a single GPU.
YOLO has the strength of detecting objects in real-time, so it is being studied actively to apply it to quality inspection. Qiu et al. [
33] developed object-detection models optimized for small-sized defects of wind turbine blades by modifying YOLO. They made a dataset of 23,807 images for the research, including labeling with three different types of defects. According to their performance analysis, the YOLO-based model achieved an average accuracy of 91.3%, which is better than that of traditional CNN-based and Machine Learning (ML)-based methods. Adibhatla et al. [
34] applied YOLO for quality inspection of Printed Circuit Boards (PCB). They also generated a dataset with 11,000 images, including the defects images labeled by skilled inspectors. The suggested YOLO-based inspection model achieved an accuracy of 98.79%. Wu and Li [
35] improved YOLOv3 by applying anchor boxes cluster analysis to detect electrical connectors’ defects. According to their performance analysis, the presented algorithm works better than does Faster R-CNN by achieving an accuracy of 93.5%. In addition to the quality inspection in manufacturing, YOLO is also being studied in various fields such as infrastructure management [
36,
37].
YOLACT is the real-time instance-segmentation model presented by Bolya et al. [
38], inspired by YOLO’s real-time object-detection capabilities. Instance segmentation is the task of detecting and explaining each object of interest that appears in the image. The former instance-segmentation models are the two-stage detectors that do instance segmentation by feature localization and masking prediction. YOLACT skips the feature localization stage, and instead, it generates the prototype mask dictionary for the whole image and predicts per-instance linear combination coefficients. Finally, the model creates the instance masks by linearly combining the prototype with the mask coefficients. The architecture of YOLACT is based on RetinaNet using ResNet101 and FPN, as shown in
Figure 6. YOLACT is the first real-time instance-segmentation algorithm and achieved a 29.8 mean Average Precision (mAP) on the MS COCO dataset at over 30 fps.
Research on applying YOLACT to industrial uses is still insufficient. Guo et al. [
39] suggested an instance-segmentation model with new backbone architecture such as YOLACT-Res2Net-50 and YOLACT-Res2Net-101 to inspect the track components of the railroad. Their performance analysis showed that their algorithm exceeds both original YOLACT and Mask R-CNN models by achieving 59.9 bounding box mAP and 63.6 masking mAP. Pan et al. [
40] used Mask R-CNN, and YOLACT combined to detect surface scratches for architectural glass panels.
The abovementioned state-of-the-art models such as YOLO and YOLACT have proved their performance and robustness based on many experiments. However, more and more AI practitioners emphasize that the key success factor for real-world AI applications is data quality. The survey conducted by Google Research reveals that data is an under-estimated and de-glamorized aspect of AI [
41]. Andrew Ng, the cofounder of DeepLearning.AI, Coursera, and LandingAI, launched a campaign for data-centric AI [
42]. A data-centric based approach stresses that the consistency of data is needed for successful AI development. Reflecting this trend, more and more AI studies, especially in the industrial field, are focusing on data acquisition. Tang et al. [
43] use X-ray images to detect defects for castings. By means of the casting defect detection system, the authors enable the generation of high-quality data. Zhou et al. [
44] proposed a machine vision apparatus to inspect the glass-bottle bottom and successfully created high-quality data. The contribution of both teams in developing new AI methodology to outperform the prevalent methods should be praised. However, the fact that the experimental results of the prevalent methods already deliver adequate performances may support the effectiveness of the data-centric approach in AI development.