Apparatus and Method of Defect Detection for Resin Films

A defect inspection of resin films involves processes of detecting defects, size measuring, type classification and reflective action planning. It is not only a process requiring heavy investment in workforce, but also a tension between quality assurance with a 50-micrometer tolerance and visibility of the naked eye. To solve the difficulties of the workforce and time consumption processes of defect inspection, an apparatus is designed to collect high-quality images in one shot by leveraging a large field-of-view microscope at 2K resolution. Based on the image dataset, a two-step method is used to first locate possible defects and predict their types by a defect-shape-based deep learning model using the LeNet-5-adjusted network. The experimental results show that the proposed method can precisely locate the position and accurately inspect the fine-grained defects of resin films.


Introduction
Defects considerably affect the quality of plastic products, and defect inspections are conducted for quality control. Defects can stain the surfaces of plastic products if quality control is not performed thoroughly, which could lead to a decrease in the sales of the product because of unfavorable impression on the customer, and consequently cause losses to the company. The most common technique of plastic product fabrication is injection molding, which involves producing parts by melting granular plastic and injecting them into a mold, and subsequently cooling the product. However, granular plastic defect inspection is difficult because the size of the defects is at microscale. Furthermore, the inside of the granular plastic can have defects. Therefore, granular plastics can be first compressed into plastic resin films. This increases microscale defect visibility. Thus, the inspection can be conducted easily. Industrial inspection systems for defect detection can mostly be divided into two types, namely the traditional procedure and the automatic procedure. The traditional procedure, which is currently prominent, involves observing whether plastic resin film quality is satisfactory through human vision by using a magnifier as the inspection tool. This inspection procedure has the following problems: 1.
Because numerous items must be inspected, high-intensity repetitive inspection can cause fatigue and lethargy in inspectors. The quality of inspected products is not guaranteed in such scenarios. This method is prone to workplace injuries because the inspecting staff can injure their eyes.

2.
Human vision cannot precisely discern how large any defect is; humans have at best an approximate awareness of whether a defect is present. This may cause the inspection process to miss some defects. Furthermore, it is nearly impossible to standardize human vision; each inspector's eyesight will have its own peculiarities. Primitive attributes are used to detect defects in traditional image processing methods. These methods can be divided into three approaches, namely threshold, structural, and spectral. The threshold approach involves transforming grayscale images to binary images, separating the background and defects by applying various types of thresholds such as adaptive thresholding method [6] and the Otsu [7,8] method. The structural approach includes edge [9], skeleton [10], and morphological operations [11]. Fourier [12], wavelet [13], and Gabor transforms [14] are used in the spectral approach. However, it is difficult to classify various nondefect objects using traditional image processing because nondefect items are not easily filtered.
In feature extraction using a machine learning method, defect detection has two stages, namely feature extraction and machine learning, that is, using a feature extraction algorithm to extract various features from images, then using machine learning techniques to determine the pattern or relationship among different classes. Shumin [15] proposed a novel fabric defect detection technique using a histogram of oriented gradient (HOG) for counting occurrences of gradient orientation in localized portions of an image, AdaBoost was used to select a small set of HOG data for support vector machine (SVM) to classify fabric defects. Kuang et al. [16] proposed a method for bamboo strip defect detection using a set of features based on local binary pattern (LBP) and gray level co-occurrence matrix (GLCM). LBP is used to describe the local texture features of the image and GLCM is used to characterize the texture of an image by calculating how often pairs of pixels with Primitive attributes are used to detect defects in traditional image processing methods. These methods can be divided into three approaches, namely threshold, structural, and spectral. The threshold approach involves transforming grayscale images to binary images, separating the background and defects by applying various types of thresholds such as adaptive thresholding method [6] and the Otsu [7,8] method. The structural approach includes edge [9], skeleton [10], and morphological operations [11]. Fourier [12], wavelet [13], and Gabor transforms [14] are used in the spectral approach. However, it is difficult to classify various nondefect objects using traditional image processing because nondefect items are not easily filtered.
In feature extraction using a machine learning method, defect detection has two stages, namely feature extraction and machine learning, that is, using a feature extraction algorithm to extract various features from images, then using machine learning techniques to determine the pattern or relationship among different classes. Shumin [15] proposed a novel fabric defect detection technique using a histogram of oriented gradient (HOG) for counting occurrences of gradient orientation in localized portions of an image, AdaBoost was used to select a small set of HOG data for support vector machine (SVM) to classify fabric defects. Kuang et al. [16] proposed a method for bamboo strip defect detection using a set of features based on local binary pattern (LBP) and gray level co-occurrence matrix (GLCM). LBP is used to describe the local texture features of the image and GLCM is used to characterize the texture of an image by calculating how often pairs of pixels with specific values in a specified spatial relationship occur in an image. After extracting the features, an SVM was proposed to classify defects. Chang et al. [17] proposed a method for defect detection on the compact lens. He segmented objects by applying weighted Sobel filters and watersheds and then used the SVM for classification. Watershed is a transformation that considers the images as topographic maps, the brightness of the images representing its height and finds the lines that run along the tops of ridges. Watershed can segment two objects that are close to each other. Zhou et al. [18] proposed a surface defect for a vehicle body by using a multiscale Hessian matrix fusion method to determine defect regions and SVM to classify defects.
Deep learning methods have achieved excellent results in many fields [19][20][21] and surface defect detection is one of them. Several defect detection methods based on convolutional neural networks (CNNs) have been proposed. Arikan et al. [22] proposed a CNN model for classification, setting two classes: defect or nondefect. This model was designed to handle capacity and real-time speed requirements. A generative adversarial network (GAN) was used to generate more data. Wang et al. [23] proposed a fast and robust CNN-based defect detection model by using CNNs with a sliding window to localize the product damage. The sliding window is time consuming and is not sufficiently efficient for our data. Mei et al. [24] designed a multiscale convolutional denoising autoencoder network (MSCDAE) model for fabric defect detection by using reconstructed image patches with the model at multiple Gaussian pyramid levels and synthesized results from these pyramid levels. Defective regions in the reconstruction residual maps were generated using the CDAE networks. This model can be trained with only a small set of defect-free samples and can deliver excellent performance. These two methods can only localize and detect defects but cannot classify the type of defects.
An object detection method is required for resin films because an image can contain various objects at the same time. Object detection is used to detect objects inside an image and many studies have been conducted on object detection. Li et al. [25] proposed an improved you-only-look-once (YOLO) network to detect six types of defects on steel strip surfaces. Cha et al. [26] proposed a structural visual inspection method based on faster R-CNN to detect five types of defects, namely steel delamination, steel corrosion, bolt corrosion, and concrete cracks. Yuan et al. [27] proposed a modified segmentation method and deep neural network to detect defects on the cover glass of mobile phones. GAN was used to generate more data for the deep learning network to overcome the problem of small amount of data. Ferguson et al. [28] used a defect detection system based on the mask region-based CNN (mask R-CNN) architecture to detect the casting defects on the GDXray dataset. Excellent performance is achieved based on transfer learning, using weights pre-trained on the ImageNet dataset, and then trained the defect detection system on the COCO (Common Objects in Context) dataset. Wen et al. [29] proposed an object detection method to detect defects on bearing rollers, using CNN to extract the features of the defects, then classified the defects, and calculated the position of the defects simultaneously. Chen et al. [30] proposed a novel vision-based method in which deep CNNs (DCNNs) are applied in the defect detection of the fasteners on the catenary support device. The system cascades three DCNN-based detection stages, including single shot multibox detector (SSD) and YOLO to localize the cantilever joints and their fasteners. Then, a classifier was used to classify defects. Li et al. [31] proposed a surface defect detection model based on the SSD network that was combined with a MobileNet to detect the sealing surface of an oil chili to achieve real-time and accurate detection. The Hough circle transform was applied to detect the oil chili. However, while above mentioned methods proposes object detection networks that calculated the position and classifies defects in one whole network, traditional image processing methods are sufficient for finding the objects because of the simple background in the proposed system. Therefore, object detection networks such as YOLO, faster R-CNN are not required for our inspection systems. The proposed method in this is mostly inspired by the following methods. Song et al. [32] proposed a deep CNN-based technique for detecting micro defects on metal screw surfaces by using traditional image processing methods to detect metal screws, then using a CNN network to classify whether a metal screw was defective. Tao et al. [33] designed a cascaded autoencoder architecture to obtain accurate and consistent defect detection results under complex lighting conditions and ambiguous defects. The autoencoder could distinguish the nonbackground objects and only required the basic thresholding method to separate nonbackground objects and the background. A deep convolution network was used to classify various types of defects.
In the proposed method, we used a small microscope to observe data. Two types of plastic resin films were present in our data, namely those with transparent and white backgrounds. Figure 2 illustrates how the resin films appear under the microscope. Various types of nonbackground objects are present on the surface; scratches (Figure 2b) and bubbles (Figure 2c) are not defined as defects. Pollutants in the industrial environment, nondefective items, such as dust (Figure 2d) may appear on the inspected surface. Different nonbackground objects have distinct features. Therefore, an image processing method to segment the nonbackground objects was proposed, and a classification module was then used to classify the defects. The results prove that the proposed method exhibited performance.
conditions and ambiguous defects. The autoencoder could distinguish the nonbackground objects and only required the basic thresholding method to separate nonbackground objects and the background. A deep convolution network was used to classify various types of defects.
In the proposed method, we used a small microscope to observe data. Two types of plastic resin films were present in our data, namely those with transparent and white backgrounds. Figure  2 illustrates how the resin films appear under the microscope. Various types of nonbackground objects are present on the surface; scratches (Figure 2b) and bubbles (Figure 2c) are not defined as defects. Pollutants in the industrial environment, nondefective items, such as dust (Figure 2d) may appear on the inspected surface. Different nonbackground objects have distinct features. Therefore, an image processing method to segment the nonbackground objects was proposed, and a classification module was then used to classify the defects. The results prove that the proposed method exhibited performance. However, although the method exhibited excellent results, some disadvantages were observed under the microscope. First, the microscope FOV was small-we need to capture hundreds of images to compose one plastic resin film. Therefore, this method does not immediately raise efficiency. It is also difficult to scan the plastic resin film completely without missing any parts; we have no method to consider whether any parts have gone unscanned. Second, identifying defects and determining the corresponding positions is difficult under the microscope. The aforementioned method is not suitable for our study because our final goal is to increase the number of inspected samples. Therefore, we proposed a machine-assisted method with a 2K-resolution camera. One plastic resin film can be captured completely by one picture using the machine; in other words, the machine can speed up the process and increase the number of samples that can be inspected in a day. Figure 3a,b depicts the transparent-background and white-background plastic resin film surfaces under a 2K-resolution camera. The transparent and white plastic resin film results vary considerably under the 2K-resolution camera compared with the data from the microscope. The defects depicted in Figure 3(a1,b1) are mostly small and dot-like, whereas the scratch and dust are shown in Figure 3(a2,b2) and Figure 3(a4,b4), respectively, are similar to lines. Bubbles depicted in Figure 3(a3,b3) are circular. The difference between each type of nonbackground objects is not as obvious in the microscale version. However, only a few images are required for composing one plastic resin film. However, although the method exhibited excellent results, some disadvantages were observed under the microscope. First, the microscope FOV was small-we need to capture hundreds of images to compose one plastic resin film. Therefore, this method does not immediately raise efficiency. It is also difficult to scan the plastic resin film completely without missing any parts; we have no method to consider whether any parts have gone unscanned. Second, identifying defects and determining the corresponding positions is difficult under the microscope. The aforementioned method is not suitable for our study because our final goal is to increase the number of inspected samples. Therefore, we proposed a machine-assisted method with a 2K-resolution camera. One plastic resin film can be captured completely by one picture using the machine; in other words, the machine can speed up the process and increase the number of samples that can be inspected in a day. Figure 3a,b depicts the transparent-background and white-background plastic resin film surfaces under a 2K-resolution camera. The transparent and white plastic resin film results vary considerably under the 2K-resolution camera compared with the data from the microscope. The defects depicted in Figure 3(a1,b1) are mostly small and dot-like, whereas the scratch and dust are shown in Figure 3(a2,b2) and Figure 3(a4,b4), respectively, are similar to lines. Bubbles depicted in Figure 3(a3,b3) are circular. The difference between each type of nonbackground objects is not as obvious in the microscale version. However, only a few images are required for composing one plastic resin film.
(3) (4) (3) (4) A microscale defect inspection architecture that could automatically identify defects for plastic resin films has been presented in this paper. The proposed method on the 2K-resolution camera was modified based on the microscope method. The microscale defect inspection architecture consists of two steps. In the first part of the architecture, an image preprocessing method for the detection part, segments, and local nonbackground objects was proposed. In the second part, a classification module is used to classify the objects. Compared with traditional procedure, the proposed method has the following advantages: 1. With the same standards for defect detection, the proposed method can obtain the precise area and location of defects. 2. The proposed method has high precision and high speed, which speeds up the inspection process and reduces labor cost. The average amount of plastic resin films that can be checked in a day increases.
The rest of paper is organized as follows: In Section 2, the overall system and proposed approach are described in detail. Experimental results are presented in Section 3 and the discussion is presented in Section 4. Finally, the conclusion is provided in Section 5.

Materials and Methods
In this section, the proposed method is discussed in detail. The microscope method is described in Section 2.1, and Section 2.2 describes the method using a 2K-resolution camera. The 2K-resolution camera method is based on the microscope method with some adjustments. A microscale defect inspection architecture that could automatically identify defects for plastic resin films has been presented in this paper. The proposed method on the 2K-resolution camera was modified based on the microscope method. The microscale defect inspection architecture consists of two steps. In the first part of the architecture, an image preprocessing method for the detection part, segments, and local nonbackground objects was proposed. In the second part, a classification module is used to classify the objects. Compared with traditional procedure, the proposed method has the following advantages:

1.
With the same standards for defect detection, the proposed method can obtain the precise area and location of defects.

2.
The proposed method has high precision and high speed, which speeds up the inspection process and reduces labor cost. The average amount of plastic resin films that can be checked in a day increases.
The rest of paper is organized as follows: In Section 2, the overall system and proposed approach are described in detail. Experimental results are presented in Section 3 and the discussion is presented in Section 4. Finally, the conclusion is provided in Section 5.

Materials and Methods
In this section, the proposed method is discussed in detail. The microscope method is described in Section 2.1, and Section 2.2 describes the method using a 2K-resolution camera. The 2K-resolution camera method is based on the microscope method with some adjustments. Figure 4 illustrates the overall architecture of the method under the microscope. The method includes two stages, namely image processing and classification. During the image processing stage, the traditional image process was used for detecting nonbackground objects, then nonbackground objects that were considered as the input of the classification model were extracted and cropped. In the classification stage, a CNN was proposed to classify various types of nonbackground objects. The system overview is presented in Section 2.1.1, and the image processing stage is depicted in Section 2.1.2 and the classification stage is in Section 2.1.3.

Microscope Method
Appl. Sci. 2020, 10, 1206 6 of 23 Figure 4 illustrates the overall architecture of the method under the microscope. The method includes two stages, namely image processing and classification. During the image processing stage, the traditional image process was used for detecting nonbackground objects, then nonbackground objects that were considered as the input of the classification model were extracted and cropped. In the classification stage, a CNN was proposed to classify various types of nonbackground objects. The system overview is presented in Section 2.1.1, and the image processing stage is depicted in Section 2.1.2 and the classification stage is in Section 2.1.3.

Image Processing
The images captured by the microscope can contain more than one type of object. Therefore, rather than using the whole image to train the classifier, objects should be detected and cropped out first. To segment and crop the nonbackground objects, the image processing methods are required  Figure 4 illustrates the overall architecture of the method under the microscope. The method includes two stages, namely image processing and classification. During the image processing stage, the traditional image process was used for detecting nonbackground objects, then nonbackground objects that were considered as the input of the classification model were extracted and cropped. In the classification stage, a CNN was proposed to classify various types of nonbackground objects. The system overview is presented in Section 2.1.1, and the image processing stage is depicted in Section 2.1.2 and the classification stage is in Section 2.1.3.

Image Processing
The images captured by the microscope can contain more than one type of object. Therefore, rather than using the whole image to train the classifier, objects should be detected and cropped out first. To segment and crop the nonbackground objects, the image processing methods are required

Image Processing
The images captured by the microscope can contain more than one type of object. Therefore, rather than using the whole image to train the classifier, objects should be detected and cropped out first. To segment and crop the nonbackground objects, the image processing methods are required in this method: Threshold, blur transform, morphological transformations, and contour extraction. In the threshold part, adaptive thresholding method was used. The formula is presented as follows: where the threshold value T(x,y) is a mean of the blocksize × blocksize neighborhood of (x,y) minus some constant c. Before applying adaptive threshold, Gaussian blur was used on the grayscale image for improving performance. We did several experiments with different combinations of the parameters, observed the data to find the most suitable parameters for our plastic resin film, afterwards obtaining the threshold image. The experiment details are provided in Section 4.1.
The threshold image was then passed through a morphological transform, using the closing technique for closing small holes inside the foreground objects. Closing is dilation followed by erosion. Dilation increases the white region parts in the binary image, whereas erosion is the opposite and erodes them away. Dilation and erosion are defined as follows: where A is our threshold image and B is our structuring element, A b is the translation of A by b and Closing is typically useful in closing small holes inside an object and connecting broken areas. Closing can successfully connect broken areas on our data and assist in determining nonbackground objects. During contour detection, the findContours function in OpenCV was used to locate objects, which extracts objects with areas more than certain pixels. In this study, 25 µm was set the defect threshold. Objects were extracted by their bounding box and cropped out of the image for further steps. Figure 6 depicts the image processing stage.
In the threshold part, adaptive thresholding method was used. The formula is presented as follows: where the threshold value T(x,y) is a mean of the blocksize × blocksize neighborhood of (x,y) minus some constant c. Before applying adaptive threshold, Gaussian blur was used on the grayscale image for improving performance. We did several experiments with different combinations of the parameters, observed the data to find the most suitable parameters for our plastic resin film, afterwards obtaining the threshold image. The experiment details are provided in Section 4.1.
The threshold image was then passed through a morphological transform, using the closing technique for closing small holes inside the foreground objects. Closing is dilation followed by erosion. Dilation increases the white region parts in the binary image, whereas erosion is the opposite and erodes them away. Dilation and erosion are defined as follows: where A is our threshold image and B is our structuring element, is the translation of A by b and denotes the translation of A by −b. Closing is typically useful in closing small holes inside an object and connecting broken areas. Closing can successfully connect broken areas on our data and assist in determining nonbackground objects.
During contour detection, the findContours function in OpenCV was used to locate objects, which extracts objects with areas more than certain pixels. In this study, 25 μm was set the defect threshold. Objects were extracted by their bounding box and cropped out of the image for further steps. Figure 6 depicts the image processing stage.

Classification
CNN learns the hierarchy of features from the input image and is commonly used in image classification, object detection, natural language processing, among others. A CNN architecture typically consists of some convolution and pooling layers that are fully connected at the end. The convolution layer is the first layer that is used extracting features from the image; filters are defined to perform multimatrix convolution with the image and obtain the feature map. The pooling layer

Classification
CNN learns the hierarchy of features from the input image and is commonly used in image classification, object detection, natural language processing, among others. A CNN architecture typically consists of some convolution and pooling layers that are fully connected at the end. The convolution layer is the first layer that is used extracting features from the image; filters are defined to Appl. Sci. 2020, 10, 1206 8 of 23 perform multimatrix convolution with the image and obtain the feature map. The pooling layer is used for reducing the number of parameters; the number of feature reduction is defined by the filter size. Pooling is of three types, namely max, average, and sum pooling, depending on the type of values that should be maintained. The fully connected layer flattens the results into the most basic neural network. Usually, a softmax activation function is used at the end for classification to output the probability of the class.
In the classification stage, the LeNet-5-adjusted network was proposed; objects are classified into their categories because of the observed defects having different colors. Colors do not provide information of the classification model; therefore, data is first transferred to grayscale images before training on CNN. The cropped out objects are resized to a size of 50 × 50 pixels. The 50 pixels size was determined by the average size of the data. In the CNN model, the input layer limits the data size, therefore we determine the size of the input layer by averaging. If the average is used, the overall difference of each contour will not be too large, hoping to keep their feature. Figure 7 illustrates the architecture of the classification module.
Appl. Sci. 2020, 10, 1206 8 of 23 is used for reducing the number of parameters; the number of feature reduction is defined by the filter size. Pooling is of three types, namely max, average, and sum pooling, depending on the type of values that should be maintained. The fully connected layer flattens the results into the most basic neural network. Usually, a softmax activation function is used at the end for classification to output the probability of the class.
In the classification stage, the LeNet-5-adjusted network was proposed; objects are classified into their categories because of the observed defects having different colors. Colors do not provide information of the classification model; therefore, data is first transferred to grayscale images before training on CNN. The cropped out objects are resized to a size of 50 × 50 pixels. The 50 pixels size was determined by the average size of the data. In the CNN model, the input layer limits the data size, therefore we determine the size of the input layer by averaging. If the average is used, the overall difference of each contour will not be too large, hoping to keep their feature. Figure 7 illustrates the architecture of the classification module. The size and style of the target image are similar to the MNIST dataset (Modified National Institute of Standards and Technology database) [34], therefore, LeNet-5 [34], with excellent performance in the MNIST dataset is used as the baseline creation model in the microscale defect detection. Average pooling can result in several features being disregarded. However, all features are crucial for our data. Therefore, average pooling was discarded and superior results were obtained in the microscale defect detection.

2K Resolution Camera
Because of the small FOV of the microscope, not only are we required to slide the plastic resin film many times to capture the complete image but we must also be careful not to miss any spot of the plastic resin film. It is difficult to determine accurate positions of the defects on the plastic resin film. Even if we know where a defect approximately is, human vision may not be able to perceive it, because it is too small for detection using human vision. The microscope method is not sufficiently efficient for the plastic resin film because it is time consuming. Therefore, we proposed a machine-assisted method with a 2K-resolution camera. Although accuracy using the microscope method is higher, the proposed method is more efficient than the microscope method and it is considerably easier to locate the defects on the plastic resin films. Only one image is required to be captured to obtain the complete plastic resin film. Figure 8 depicts the architecture of the proposed method using a 2K-resolution camera. This method consists of three stages, namely image adjustment, image processing, and classification. The image adjustment stage improves performance during comparison using the image processing stage. The image processing stage and classification stage are mostly the same that in the microscope method, with the exception of a comparison part to filter noise and reduce the amount contours. The size and style of the target image are similar to the MNIST dataset (Modified National Institute of Standards and Technology database) [34], therefore, LeNet-5 [34], with excellent performance in the MNIST dataset is used as the baseline creation model in the microscale defect detection. Average pooling can result in several features being disregarded. However, all features are crucial for our data. Therefore, average pooling was discarded and superior results were obtained in the microscale defect detection.

2K Resolution Camera
Because of the small FOV of the microscope, not only are we required to slide the plastic resin film many times to capture the complete image but we must also be careful not to miss any spot of the plastic resin film. It is difficult to determine accurate positions of the defects on the plastic resin film. Even if we know where a defect approximately is, human vision may not be able to perceive it, because it is too small for detection using human vision. The microscope method is not sufficiently efficient for the plastic resin film because it is time consuming. Therefore, we proposed a machine-assisted method with a 2K-resolution camera. Although accuracy using the microscope method is higher, the proposed method is more efficient than the microscope method and it is considerably easier to locate the defects on the plastic resin films. Only one image is required to be captured to obtain the complete plastic resin film. Figure 8 depicts the architecture of the proposed method using a 2K-resolution camera. This method consists of three stages, namely image adjustment, image processing, and classification. The image adjustment stage improves performance during comparison using the image processing stage. The image processing stage and classification stage are mostly the same that in the microscope method, with the exception of a comparison part to filter noise and reduce the amount contours.  Figure 9 depicts the sketch map of the 2K-resolution camera. The key component of the system is the VS-LDA series lens from vs. technology corporation [35], which has a 2K resolution (5472 × 3648 pixels) and has been fixed at a shooting distance of 12.5 cm on our machine. This series lens is best known for its low distortion, even when using extension tubes and is designed to support a wide range of magnification, wide angle (WD), and depth of field. The resolution of the camera can reach 762 dpi (30 pixels/mm), which satisfies more than 25-μm defect detection requirement. A light regulator was used to control light sources to ensure the same light condition was used throughout the study. This minimized the influence of light on the data. A dust removal fan was used to filter dust on plastic resin films. Figure 9. Sketch of the inspection process using the 2K-resolution camera.

Image Adjustment and Image Processing
Normally, the inspection process of the plastic resin film is not conducted in a dust-free environment. Therefore, the plastic resin film can contain dust. Thus, identifying defects is difficult because dust and defects (scratches and edges) appear similar under the 2K-resolution camera. This can decrease the classification performance. Therefore, a comparing component was introduced to filter noise and avoid dust before contour extraction. In this approach, three images on the same  Figure 9 depicts the sketch map of the 2K-resolution camera. The key component of the system is the VS-LDA series lens from vs. technology corporation [35], which has a 2K resolution (5472 × 3648 pixels) and has been fixed at a shooting distance of 12.5 cm on our machine. This series lens is best known for its low distortion, even when using extension tubes and is designed to support a wide range of magnification, wide angle (WD), and depth of field. The resolution of the camera can reach 762 dpi (30 pixels/mm), which satisfies more than 25-µm defect detection requirement. A light regulator was used to control light sources to ensure the same light condition was used throughout the study. This minimized the influence of light on the data. A dust removal fan was used to filter dust on plastic resin films.

System Overview
Appl. Sci. 2020, 10, 1206 9 of 23 Figure 8. Architecture of the method with the 2K-resolution camera. Figure 9 depicts the sketch map of the 2K-resolution camera. The key component of the system is the VS-LDA series lens from vs. technology corporation [35], which has a 2K resolution (5472 × 3648 pixels) and has been fixed at a shooting distance of 12.5 cm on our machine. This series lens is best known for its low distortion, even when using extension tubes and is designed to support a wide range of magnification, wide angle (WD), and depth of field. The resolution of the camera can reach 762 dpi (30 pixels/mm), which satisfies more than 25-μm defect detection requirement. A light regulator was used to control light sources to ensure the same light condition was used throughout the study. This minimized the influence of light on the data. A dust removal fan was used to filter dust on plastic resin films. Figure 9. Sketch of the inspection process using the 2K-resolution camera.

Image Adjustment and Image Processing
Normally, the inspection process of the plastic resin film is not conducted in a dust-free environment. Therefore, the plastic resin film can contain dust. Thus, identifying defects is difficult because dust and defects (scratches and edges) appear similar under the 2K-resolution camera. This can decrease the classification performance. Therefore, a comparing component was introduced to filter noise and avoid dust before contour extraction. In this approach, three images on the same Figure 9. Sketch of the inspection process using the 2K-resolution camera.

Image Adjustment and Image Processing
Normally, the inspection process of the plastic resin film is not conducted in a dust-free environment. Therefore, the plastic resin film can contain dust. Thus, identifying defects is difficult because dust and defects (scratches and edges) appear similar under the 2K-resolution camera. This can decrease the classification performance. Therefore, a comparing component was introduced to filter noise and avoid dust before contour extraction. In this approach, three images on the same plastic resin film were used, the contours of each image were determined and compared, and a fan was used to remove dust each time before capturing an image.
If the contour's location was different than that of other contours, then the contour was mostly dust or fiber. The comparison between the contours must be in the same location or at least close, we used Intersection of Union (IoU) to examine whether they are the same object, which is defined as follows: when we evaluate the IoU of the bounding box of the detected objects, the bounding box is defined during contour detection. We defined IoUs that are more than 0.3 as the same object. However, it is impossible to capture three images at the same position. Manual adjustment of the picture to the exact position is time consuming and reduces efficiency. Therefore, affine transformation is used. Affine transformation is any transformation that can be expressed through matrix multiplication. Rotation, translation, or scaled operations can be expressed using an affine transformation. Affine transform is the relation between two images, usually shown as a 2 × 3 matrix. The formula used for image adjustment is expressed as follows: where M is the affine transform matrix, (x,y) is the original image position, and T is the new position after applying the affine transform. To apply the transform to an image, M must be defined first. Our goal was to apply the affine transform to transform one image into another image, which required three corresponding positions in the two images. Thus, the relation between the two images can be obtained and can be transferred from one to the other. Three same points can be obtained from both images to achieve the affine transform matrix, the transform can then be applied to one of the images, and transform into the other image. The affine transformation is depicted in Figure 10.
Appl. Sci. 2020, 10, 1206 10 of 23 plastic resin film were used, the contours of each image were determined and compared, and a fan was used to remove dust each time before capturing an image. If the contour's location was different than that of other contours, then the contour was mostly dust or fiber. The comparison between the contours must be in the same location or at least close, we used Intersection of Union (IoU) to examine whether they are the same object, which is defined as follows: when we evaluate the IoU of the bounding box of the detected objects, the bounding box is defined during contour detection. We defined IoUs that are more than 0.3 as the same object. However, it is impossible to capture three images at the same position. Manual adjustment of the picture to the exact position is time consuming and reduces efficiency. Therefore, affine transformation is used. Affine transformation is any transformation that can be expressed through matrix multiplication. Rotation, translation, or scaled operations can be expressed using an affine transformation. Affine transform is the relation between two images, usually shown as a 2 × 3 matrix. The formula used for image adjustment is expressed as follows: where M is the affine transform matrix, (x,y) is the original image position, and T is the new position after applying the affine transform. To apply the transform to an image, M must be defined first. Our goal was to apply the affine transform to transform one image into another image, which required three corresponding positions in the two images. Thus, the relation between the two images can be obtained and can be transferred from one to the other. Three same points can be obtained from both images to achieve the affine transform matrix, the transform can then be applied to one of the images, and transform into the other image. The affine transformation is depicted in Figure 10. Therefore, three images were captured, and the affine transform was applied to two of the images making them exactly the same, and then the locations of the contours were compared to determine whether the objects were dust. To achieve the affine transform matrix, three exact points Therefore, three images were captured, and the affine transform was applied to two of the images making them exactly the same, and then the locations of the contours were compared to determine whether the objects were dust. To achieve the affine transform matrix, three exact points are required.
Three stickers were placed on the plastic resin film. Some simple thresholding technique and contour detection could easily determine the center position.
After the images had been transformed to the same position, image processing was performed, and the bounding box of the contours in all three images was obtained after contour detection. IoU more than 0.3 was counted as the same contour as mentioned before. The contours were extracted only if all three images have them. The results indicated the method can efficiently filter most of the dust and reduce the number of contours. Figure 11 illustrates the details of both the image adjusting stage and image processing stage.
Appl. Sci. 2020, 10, 1206 11 of 23 are required. Three stickers were placed on the plastic resin film. Some simple thresholding technique and contour detection could easily determine the center position. After the images had been transformed to the same position, image processing was performed, and the bounding box of the contours in all three images was obtained after contour detection. IoU more than 0.3 was counted as the same contour as mentioned before. The contours were extracted only if all three images have them. The results indicated the method can efficiently filter most of the dust and reduce the number of contours. Figure 11 illustrates the details of both the image adjusting stage and image processing stage. Figure 11. Overview of the image adjusting and image processing stage in the 2K-resolution camera method.

Classification
The LeNet-5-adjusted CNN network was used as the classification model, which is the same as the microscope method. The input size was changed to 52 × 52 pixels, the size was determined by the average size of the data

Experiments and Results
In this section, we discuss the experimental setup before training on the LeNet-5-adjusted network and the parameters that have been used during training. The microscope method is discussed in Section 3.1 and the 2K-resolution camera is discussed in Section 3.2. The recall was used to evaluate the performance of the classification task, which is calculated as follows: while recall is our main standard, we also want to gain a high-precision F1-score. Their formulas are shown as follows: F1 − Score = 2 × 2 × + + (8) Figure 11. Overview of the image adjusting and image processing stage in the 2K-resolution camera method.

Classification
The LeNet-5-adjusted CNN network was used as the classification model, which is the same as the microscope method. The input size was changed to 52 × 52 pixels, the size was determined by the average size of the data

Experiments and Results
In this section, we discuss the experimental setup before training on the LeNet-5-adjusted network and the parameters that have been used during training. The microscope method is discussed in Section 3.1 and the 2K-resolution camera is discussed in Section 3.2. The recall was used to evaluate the performance of the classification task, which is calculated as follows: while recall is our main standard, we also want to gain a high-precision F1-score. Their formulas are shown as follows: Precision = T P T P + F P F1 − Score = 2 × T P 2 × T P + F N + F P (8) T P (true positive) indicates the number of defects classified correctly, F N (false negative) indicates defects classified into wrong categories and F P (false positive) indicates the number of nondefects classified as defects. The research goal is to reduce the number of suspicious defects of human judgment, but also not to overkill the true positive-therefore, the recall needs to be high. However, while the recall needs to be high, we also want to prevent having too low precision. Therefore, we would also consider precision and F1-Score.

Experiment Setup
For the adaptive threshold parameters, the block size was set to 205 and the constant c was set as 30 during the image processing stage by observing the defects and confirming if all the defects were captured. Contour extraction only extracts objects with areas larger than 400 pixels because under the microscope, a 25-µm defect is approximately that size.
The dataset of plastic resin film images was provided from a production line of a plastic component using a microscope. All the defect components were previously inspected using an expert examiner. During the image processing stage, we cropped 237 defects and 2457 bubbles because such irregularities would undermine the training of the neural network. Therefore, some modifications were incorporated, including rotation and flipping. In this case, the aforementioned operation increased the number of defects to 2796.
For the classification task, 100 images were used for testing and the remaining images were used for training. The LeNet-5-adjusted network was trained for 100 epochs and the batch size was set to 100.

Evaluation of the CNN Model
To verify the performance of the proposed CNN, in this section, we compared the inspection results of this model with those of other CNN networks. The Arikan et al. [22] proposed SURFnet, which was inspired by the VGG network [36] configurations and residual learning [37]. It has nine convolution layers and one fully connected layer. Each convolution layer contains a batch normalization and using the parametric rectified linear unit as the activation function. The LeNet-5 architecture consists of two pairs of convolution and average pooling layers, followed by a flatten layer and two fully connected layers. Both network epochs were set to 200 and the batch size was set to 32 during training. Table 1 depicts the performance of the defect class with various networks. We mainly focused on whether all defects were detected, evaluating the recall of the defect class. The results prove that LeNet-5 has high precision, and their recall is the lowest. This proves that superior performance can be obtained if the average pooling layer was removed. This is because extracted objects may be only approximately 1 pixel. Average pooling can cause loss of vital features. For SURFnet, although the precision is high (up to 1), the recall is not as high as the LeNet-5-adjusted network.

Implementation Detail
The inspection experiment system was developed using Python 3.6.7, with Keras as the deep learning platform. The following results were obtained using a server with Intel(R) Core(TM) i7-8700K CPU and NVIDIA TITAN-V with 11 GB of memory as graphic processing unit.

Experimental Setup
During the image adjustment stage, the center location of the three stickers was obtained by applying some simple threshold transform and object extraction. The adaptive threshold parameter was selected by observing the defects and confirming if all the defects were detected. We set the blocksize to 55 and the constant c to 19. For contour extraction, we extract objects that areas are above 1 pixel while limiting the size under 500 pixels since the size of 25 µm under the 2K-resolution camera is approximately that size and all the observed defects are under 500 pixels. Table 2 lists the average number of objects in one plastic resin film before and after comparison. The result indicated that it can successfully reduce most of the dust and a considerable number of objects. More objects were observed against the transparent background than against the white background because more bubbles were present in it. Because the transparent-and white-background plastic data was considerably different on the 2K-resolution camera, we trained them separately. Both networks were trained for 200 epochs, and the batch size was set to 32 on the transparent-background plastic and 16 on white-background plastic. The number of output classes were different. The transparent background has three classes, namely bubbles, defects, and scratches, whereas the white background only has two classes, combining bubbles and scratches into one. For the training process, we considered 177 bubbles, 31 defects, and 37 scratches and edges on a transparent background, and 72 bubbles and scratches, 32 defects on a white background. Some augmentations were made to balance the data.

Evaluation of the CNN Model with Feature Selection Algorithm
To verify the performance, we compare our CNN model with some traditional methods, in our experiment we use two feature selection algorithms with SVM or MLP. Histogram of oriented gradient (HOG) is for counting occurrences of gradient orientation in localized portions of an image while local binary pattern (LBP) is a powerful feature for texture classification. The quantization of the gray values in HOG is twelve and the number of circularly symmetric neighbor set points for LBP is eight. The MLP consists two hidden layers, the first layer contains of 12 units while the second contains 8. For the transparent-background data, the output layer contains two output variables while the white-background data contains three output variables. The MLP model were trained for 50 epochs, and the batch size was set to 32. Table 3 displays the performance of the transparent-background plastic and Table 4 displays the performance of the white-background plastic.
As shown in Tables 3 and 4, the LBP + SVM and HOG + SVM mostly have low recall, not only that, their precision and f1-score did not show good results, either. In Table 3, while LBP + MLP has a rather good result on the transparent-background data compare to the other methods, our CNN models still has better results. HOG + MLP has a better recall than our model, however their precision is too low. In Table 4, three of the methods have the same result on the recall with our CNN model; however, the precision is lower than ours. It shows that it is difficult to fully classify the features of defects only by gradient features and a classifier. Our method is better compared to the feature selection algorithms.

Evaluation of the CNN Model with other CNN Networks
To verify the performance of the proposed CNN network, in this section we compared the inspection results of this model with those of other CNN networks. Table 5 lists the performance of the transparent-background plastic and Table 6 displays the performance of the white-background plastic. As depicted in Table 5, the original LeNet-5 does not exhibit high recall (0.74), whereas both SURFnet and the proposed CNN exhibited a recall of 0.84. Table 6 lists the same concept. Both transparent and white plastic on LeNet-5 do not perform better recall, and the precision on the white background is notably low, and even though SURFnet on both plastics had the same recall compared with the proposed CNN, the CNN structure of SURFnet is considerably more complex.

Discussion
During the development of the proposed method, a number of experiments were performed to deliver high performance. In this section, details during experimentation are discussed. The image processing stage is discussed in Sections 4.1 and 4.2 and the hyperparameters during the training of the CNN model are discussed in Section 4.3. Finally, implementation issues are discussed in Section 4.4

Threshold
When selecting the suitable thresholding method, the Otsu method and the basic thresholding method did not provide a satisfactory performance on both kinds of data. On the microscope data, the low-quality image causing too much noise on the image, therefore Otsu did not performance great. As for the 2K-resolution camera data, the objects were small and not sufficiently obvious for Otsu to obtain. The basic threshold, in which a pixel value was selected and set as the boundary line did not provide satisfactory results because the microscope data and 2K-resolution camera has varying light at the center. Therefore, the adaptive threshold is the most suitable approach among others. There are different kinds of adaptive thresholding method, such as mean, Niblack [38] and Sauvola [39]. The adaptive threshold sets the boundary line as shown in Equation (3), different adaptive thresholding methods determine the T(x, y) value differently. The Niblack and Sauvola formula is shown as below.
T(x, y) = m(x, y)·K·s(x, y) T(x, y) = m(x, y)· 1 + K· s(x, y) where m (x,y) is the mean of the blocksize × blocksize neighborhood, s (x,y) is the standard deviation. The parameter K gets positive values, and R is the dynamic range of standard deviation. The results after using different thresholding methods are shown in Figure 12. As shown in Figure 12a,b, nonuniform light source causes bad performance on the basic thresholding method. In Figure 12c, Otsu cannot get any objects on the plastic resin film. Although Niblack and Sauvola shows better performance compared to the two methods mentioned above, these methods gain a lot of unrequired noises, making it difficult for the contour detection afterwards. In addition, many objects are fragmented while using Sauvola. Their computation time is also rather slow compared to the mean adaptive threshold. Therefore, we chose mean adaptive threshold.
Appl. Sci. 2020, 10, 1206 15 of 23 processing stage is discussed in Section 4.1 and Section 4.2 and the hyperparameters during the training of the CNN model are discussed in Section 4.3. Finally, implementation issues are discussed in Section 4.4

Threshold
When selecting the suitable thresholding method, the Otsu method and the basic thresholding method did not provide a satisfactory performance on both kinds of data. On the microscope data, the low-quality image causing too much noise on the image, therefore Otsu did not performance great. As for the 2K-resolution camera data, the objects were small and not sufficiently obvious for Otsu to obtain. The basic threshold, in which a pixel value was selected and set as the boundary line did not provide satisfactory results because the microscope data and 2K-resolution camera has varying light at the center. Therefore, the adaptive threshold is the most suitable approach among others. There are different kinds of adaptive thresholding method, such as mean, Niblack [38] and Sauvola [39]. The adaptive threshold sets the boundary line as shown in Equation (3), different adaptive thresholding methods determine the ( , ) value differently. The Niblack and Sauvola formula is shown as below.
where m (x,y) is the mean of the blocksize × blocksize neighborhood, s (x,y) is the standard deviation. The parameter K gets positive values, and R is the dynamic range of standard deviation. The results after using different thresholding methods are shown in Figure 12 . As shown in Figure  12a,b, nonuniform light source causes bad performance on the basic thresholding method. In Figure  12c, Otsu cannot get any objects on the plastic resin film. Although Niblack and Sauvola shows better performance compared to the two methods mentioned above, these methods gain a lot of unrequired noises, making it difficult for the contour detection afterwards. In addition, many objects are fragmented while using Sauvola. Their computation time is also rather slow compared to the mean adaptive threshold. Therefore, we chose mean adaptive threshold.
(a) However, the parameters for the mean adaptive threshold must be adjusted because of the data difference and light effect, which requires time. Finding the best parameters for the adaptive threshold consumed the most amount of time in the experiment. We mainly attempted the combination and compared each of them. Figure 13 shows the comparison between different parameter values. However, the parameters for the mean adaptive threshold must be adjusted because of the data difference and light effect, which requires time. Finding the best parameters for the adaptive threshold consumed the most amount of time in the experiment. We mainly attempted the combination and compared each of them. Figure 13 shows the comparison between different parameter values.
Here, x, y are the parameters of the adaptive threshold where x is the constant c and y is the blocksize, the total number of the missing ground true defect are depicted in Figure 13a and the total number of contours that were found are illustrated in Figure 13b. The figure shows that only the parameter with blocksize 55 and c 5 has gotten all the defects, however while our goal is to find all the defects as many as possible, we also want to reduce the number of objects being detected. Therefore, we observe the data under four different combination; each combination has a different amount of missing ground true, the objects being detected on the threshold image are shown in Figure 14. It is obvious that blocksize 55, c 5 is not a suitable parameter; as shown in Figure 14d, it is not suitable for finding the objects on the plastic resin film, Figure 14b will detect a lot of noises. Other than that due to observation, the parameters used on Figure 14c would cause some same scratches being detected as different objects, these sub-scratches have similar features as the defects, which would cause it hard to classify between defect and scratches. Therefore, because of the aforementioned results, a blocksize of 55 and a constant c of 19 were set. Here, x, y are the parameters of the adaptive threshold where x is the constant c and y is the blocksize, the total number of the missing ground true defect are depicted in Figure 13a and the total number of contours that were found are illustrated in Figure 13b. The figure shows that only the parameter with blocksize 55 and c 5 has gotten all the defects, however while our goal is to find all the defects as many as possible, we also want to reduce the number of objects being detected. Therefore, we observe the data under four different combination; each combination has a different amount of missing ground true, the objects being detected on the threshold image are shown in Figure 14. It is obvious that blocksize 55, c 5 is not a suitable parameter; as shown in Figure 14d, it is not suitable for finding the objects on the plastic resin film, Figure 14b will detect a lot of noises. Other than that due to observation, the parameters used on Figure 14c would cause some same scratches being detected as different objects, these sub-scratches have similar features as the defects, which would cause it hard to classify between defect and scratches. Therefore, because of the aforementioned results, a blocksize of 55 and a constant c of 19 were set.

Contour Extraction
During contour extraction, when the minimum bounding box of the contours in the microscope data, we did not apply the minimum on the camera data. This is mainly because we detected the possibility of objects with area less than 10 pixels. The minimum bounding box that was used to extract the object would then be difficult to train during the classification stage.
As depicted in Figure 15, if we selected to extract the object by the minimum bounding box of

Contour Extraction
During contour extraction, when the minimum bounding box of the contours in the microscope data, we did not apply the minimum on the camera data. This is mainly because we detected the possibility of objects with area less than 10 pixels. The minimum bounding box that was used to extract the object would then be difficult to train during the classification stage.
As depicted in Figure 15, if we selected to extract the object by the minimum bounding box of the contour, limited background details could be compared. Because our input for the LeNet-5-adjusted network must be of the same size, loss of attribute features occurred during resizing, which caused the accuracy to decrease during classification. In this case, we expanded the bounding box. The extracted objects attribute features were more obvious (Figure 14b), we expanded the area to approximately 5, 10, and 15 pixels; then we trained them on the LeNet-5-adjusted network. The best results were obtained for 15 pixels during the classification stage.

Contour Extraction
During contour extraction, when the minimum bounding box of the contours in the microscope data, we did not apply the minimum on the camera data. This is mainly because we detected the possibility of objects with area less than 10 pixels. The minimum bounding box that was used to extract the object would then be difficult to train during the classification stage.
As depicted in Figure 15, if we selected to extract the object by the minimum bounding box of the contour, limited background details could be compared. Because our input for the LeNet-5-adjusted network must be of the same size, loss of attribute features occurred during resizing, which caused the accuracy to decrease during classification. In this case, we expanded the bounding box. The extracted objects attribute features were more obvious (Figure 14b), we expanded the area to approximately 5, 10, and 15 pixels; then we trained them on the LeNet-5-adjusted network. The best results were obtained for 15 pixels during the classification stage.

Hyperparameter
Fine adjustments were performed under the light camera. When creating the model, dropout and pooling layers were used. However, these did not have a considerable effect. The batch size was switched between 16, 32, and 64. Excellent results were obtained for the white-background plastic with a batch size of 16, whereas excellent results were obtained with a batch of 32 for transparent background. During tuning, we added different poolings, but did not achieve better results than that using the nonpooling model. The use of the dropout layer did not achieve better results on the data.

Implementation Issues
Camera data issue: 2K resolution does not have sufficient quality for the plastic resin film; the nonbackground object is still considerably blurry under the camera and sometimes it is difficult to distinguish between a bubble and a defect. Many defects were less than 5 pixels across and the CNN could not be trained to detect them. Raising the platform and lowering the FOV, the defects increased to 10 to 15 pixels and exhibited a superior result in classification. However, three images

Hyperparameter
Fine adjustments were performed under the light camera. When creating the model, dropout and pooling layers were used. However, these did not have a considerable effect. The batch size was switched between 16, 32, and 64. Excellent results were obtained for the white-background plastic with a batch size of 16, whereas excellent results were obtained with a batch of 32 for transparent background. During tuning, we added different poolings, but did not achieve better results than that using the nonpooling model. The use of the dropout layer did not achieve better results on the data.

Implementation Issues
Camera data issue: 2K resolution does not have sufficient quality for the plastic resin film; the nonbackground object is still considerably blurry under the camera and sometimes it is difficult to distinguish between a bubble and a defect. Many defects were less than 5 pixels across and the CNN could not be trained to detect them. Raising the platform and lowering the FOV, the defects increased to 10 to 15 pixels and exhibited a superior result in classification. However, three images were required to compose one plastic resin film. This leads to other problems such as localization of the defects in one combined image and ensuring every part of the plastic resin film is captured in the three images. Therefore, we did not raise the height of the platform, and although classification was not better and the nonbackground objects were not clearer, it was considerably efficient during the inspection process.
Data labeling: Considerable time was required for training the data, as every object was cropped to be classified as the right class, most of the time microscope was required to classify the object into the correct class. Furthermore, expanded area of 5, 10, and 15 pixels exhibited different data. This is time consuming and careful scrutiny is required at this stage because this strongly affects the efficiency of our classification model.
During the classification of the camera data, unstable results were obtained for the white background when divided into three categories, and bubbles and scratch would be categorized as defects as shown in Table 7a and Figure 16a. Both models were trained on the same data with the same hyperparameters. Therefore, the bubble and the scratch classes were combined, which exhibited superior results as depicted in Table 7b and Figure 16b. the defects in one combined image and ensuring every part of the plastic resin film is captured in the three images. Therefore, we did not raise the height of the platform, and although classification was not better and the nonbackground objects were not clearer, it was considerably efficient during the inspection process. Data labeling: Considerable time was required for training the data, as every object was cropped to be classified as the right class, most of the time microscope was required to classify the object into the correct class. Furthermore, expanded area of 5, 10, and 15 pixels exhibited different data. This is time consuming and careful scrutiny is required at this stage because this strongly affects the efficiency of our classification model.
During the classification of the camera data, unstable results were obtained for the white background when divided into three categories, and bubbles and scratch would be categorized as defects as shown in Table 7a and Figure 16a. Both models were trained on the same data with the same hyperparameters. Therefore, the bubble and the scratch classes were combined, which exhibited superior results as depicted in Table 7b and Figure 16b. However, while white-background plastic has better results while dividing into two categories, the transparent-backgound plastic does not. As shown in Table 8 and Figure 17, the recall of the defect class is higher when dividing into three categories. Therefore, during training the transparent-background plastic data, we choose to divide the data into three categories: defect, bubble and scratches.  However, while white-background plastic has better results while dividing into two categories, the transparent-backgound plastic does not. As shown in Table 8 and Figure 17, the recall of the defect class is higher when dividing into three categories. Therefore, during training the transparent-background plastic data, we choose to divide the data into three categories: defect, bubble and scratches.

Increasing Accuracy
To increase time efficiency, the 2K-resolution camera was selected, and although recall is our priority, precision should also increase. Therefore, the data was doubly verified by applying the 2K-resolution camera first, and then using the microscope method.

Conclusions
In this paper, an automatic method that could identify nonbackground objects and classify objects was proposed by using a traditional image processing method to detect nonbackground objects, and a CNN network to classify the defects. Based on the proposed method, the inspection was converted to a segmentation and classification problem. Traditional image processing detected objects; a comparison part was used to filter dust on the plastic resin film. The results indicated that the process was successfully able to filter out a sufficient amount of dust. A LeNet-5-adjusted network was proposed for the classification, and the results exhibited excellent accuracy, proving that an average pooling of the data can lead to loss of vital features. The transparent-background data has an 84% recall and a 77% precision, while the white-background data has a 97% recall and a 78% precision. The proposed automatic defect detection with machine vision-based method can improve the disadvantages of the traditional procedure. The proposed method exhibits uniform measurements, unlike the traditional method in which the criteria can vary for each inspector. Whereas human vision for the inspection process is quite time consuming and inefficient, the proposed method is fast. Using the traditional method, the average number of plastic resin films that can be checked by a person in a day is approximately 12 to 15. However, with the proposed method, the amount of time needed to check one plastic resin film can be down to 3 minutes. In the proposed method, the labor costs of the inspection process can be reduced, the workload of the inspectors can be lightened, and productivity can be increased.

Increasing Accuracy
To increase time efficiency, the 2K-resolution camera was selected, and although recall is our priority, precision should also increase. Therefore, the data was doubly verified by applying the 2K-resolution camera first, and then using the microscope method.

Conclusions
In this paper, an automatic method that could identify nonbackground objects and classify objects was proposed by using a traditional image processing method to detect nonbackground objects, and a CNN network to classify the defects. Based on the proposed method, the inspection was converted to a segmentation and classification problem. Traditional image processing detected objects; a comparison part was used to filter dust on the plastic resin film. The results indicated that the process was successfully able to filter out a sufficient amount of dust. A LeNet-5-adjusted network was proposed for the classification, and the results exhibited excellent accuracy, proving that an average pooling of the data can lead to loss of vital features. The transparent-background data has an 84% recall and a 77% precision, while the white-background data has a 97% recall and a 78% precision. The proposed automatic defect detection with machine vision-based method can improve the disadvantages of the traditional procedure. The proposed method exhibits uniform measurements, unlike the traditional method in which the criteria can vary for each inspector. Whereas human vision for the inspection process is quite time consuming and inefficient, the proposed method is fast. Using the traditional method, the average number of plastic resin films that can be checked by a person in a day is approximately 12 to 15. However, with the proposed method, the amount of time needed to check one plastic resin film can be down to 3 min. In the proposed method, the labor costs of the inspection process can be reduced, the workload of the inspectors can be lightened, and productivity can be increased.

Conflicts of Interest:
The authors declare no conflicts of interest.