Surface Defect Detection for Mobile Phone Back Glass Based on Symmetric Convolutional Neural Network Deep Learning

Defect detection based on machine vision and machine learning techniques has drawn much attention in recent years. Deep learning is very suitable for such segmentation and detection tasks and has become a promising research area. Surface quality inspection is essentially important in the manufacturing of mobile phone back glass (MPBG). Different types of defects are produced because of the imperfection of the manufacturing technique. Unlike general transparent glass, screen printing glass has totally different reflection and scattering characteristics, which means the traditional dark-field imaging system is not suitable for this task. Meanwhile, the imaging system requires high resolution since the minimum defect size can be 0.005 mm2. According to the imaging characteristics of screen printing glass, this paper proposes a coaxial bright-field (CBF) imaging system and low-angle bright-field (LABF) imaging system, and 8K line-scan complementary metal oxide semiconductor(CMOS) cameras are utilized to capture images with the resolution size of 16,000*8092. The CBF system is applied for the weak-scratch and discoloration defects while the LABF system is applied for the dent defect. A symmetric convolutional neural network composed of encoder and decoder structures is proposed based on U-net, which produces a semantic segmentation with the same size as the original input image. More than 10,000 original images were captured, and more than 30,000 defective and non-defective images were manually annotated in the glass surface defect dataset (GSDD). Verified by the experiments, the results showed that the average precision reaches more than 91% and the average recall rate reaches more than 95%. The method is very suitable for the surface defect inspection of screen printing mobile phone back glass.


Introduction
Quality control of products is very important in manufacturing industries. The traditional manual visual inspection method needs a lot of well-trained workers, is always labor consuming and inefficient, and the standards can be very different because of personal subjectivity. In the past decades, with the development of the optical technique and computer technique, many automatic optical inspection (AOI) solutions [1][2][3][4] were proposed for the surface defect inspection task. Such a contactless inspection method can essentially improve the inspection accuracy and efficiency, providing guidance in production. The labor expenses can also be tremendously reduced. A typical defect inspection system is mainly composed of an imaging system and image processing algorithms. The imaging system should be carefully designed based on the imaging characteristics of the objects' surface, and a charge-coupled device(CCD) or CMOS camera applied to capture the images of objects illuminated by a dent, and discoloration. The second part is presenting the symmetric convolutional neural network for the image segmentation. Here, more than 30,000 images were manually annotated and used as the training sample for the Symmetry-Net. The comparison with the classical traditional machine vision technique is also presented.

Imaging Capture System
The surface quality is determined by the waviness, roughness, and microdefects of the detection object. Since the minimum defect size is 0.005 mm 2 and the size of the detection object is bigger than 150*60 mm, it is difficult for the plane array camera to achieve such a high resolution with only a single shot. In order to improve the imaging efficiency, the line-scan cameras were applied for the imaging capture system. In the classical detection system of ordinary transparent glass, the dark-field imaging system was always the first choice [4,6,24,28], as the low-angle light irradiates on the smooth glass plane, and the CMOS camera only accepts the scattered light of defects since the scattered light of the background is always very weak. However, the scattering properties are really different when the surface of the glass is covered with ink, as the scattered light of background becomes stronger, thus bringing more noise disturbance to the image and resulting in a poor imaging performance of shallow scratch and dent defects.
The imaging characteristics of different defects can be very different. In order to obtain a high-quality image for the screen printing mobile phone back cover glass, a coaxial bright-field (CBF) imaging system and low-angle bright-field (LABF) imaging system are applied in this paper; the bright-field type of imaging system captures the reflected light, as shown in Figure 1. The direction of the reflected light is opposite to that of the incident light with the help of the crucial optical element beam-slitter in the CBF system, as shown in Figure 1. A higher contrast image of the shallow scratch defect and discoloration defect can be obtained in CBF while the imaging quality of the dent defect is higher in the LABF system. The low angle means the angle between the direction of incident light and the direction of X axis is small.
Two 8K line-scan CMOS cameras are utilized to capture the images with the unit cell size of the CMOS sensor being 7.04 µm × 7.04 µm. Imaging systems are fixed above the transmission system. To avoid the mutual interference of incident light sources, the focus position of CBF system is at a distance of about 30 mm away from that of the LABF system.
Normal scratch defects can be easily obtained with a high variance value. However, sometimes, there are some scratch defects with a shallow depth below 50 nm. The shallow defect is too weak to be detected in the dark-field imaging system. As shown in Figure 2, the average grayscale value of the images of the CBF system is higher than that of LABF. The region of the shallow scratch defect has a higher variance value in the CBF system, as shown in Figure 2c.
The dent defect is produced by the surface irregular waviness. The region of the dent defect has a much higher variance value in the LABF system, as shown in Figure 3c.
Appl. Sci. 2020, 10, 3621 4 of 13 the bright-field type of imaging system captures the reflected light, as shown in Figure 1. The direction of the reflected light is opposite to that of the incident light with the help of the crucial optical element beam-slitter in the CBF system, as shown in Figure 1. A higher contrast image of the shallow scratch defect and discoloration defect can be obtained in CBF while the imaging quality of the dent defect is higher in the LABF system. The low angle means the angle between the direction of incident light and the direction of X axis is small. Figure 1. Sketch of the coaxial bright-field (CBF) imaging system and low-angle bright-field (LABF) imaging system, composed of an light emitting diode(LED) light source, collimating lenses, beam-splitter, imaging lens, and CMOS. Two 8K line-scan CMOS cameras are utilized to capture the images with the unit cell size of the CMOS sensor being 7.04 μm × 7.04 μm. Imaging systems are fixed above the transmission system. To avoid the mutual interference of incident light sources, the focus position of CBF system is at a distance of about 30 mm away from that of the LABF system.
Normal scratch defects can be easily obtained with a high variance value. However, sometimes, there are some scratch defects with a shallow depth below 50 nm. The shallow defect is too weak to be detected in the dark-field imaging system. As shown in Figure 2, the average grayscale value of the images of the CBF system is higher than that of LABF. The region of the shallow scratch defect has a higher variance value in the CBF system, as shown in Figure 2. c. The dent defect is produced by the surface irregular waviness. The region of the dent defect has a much higher variance value in the LABF system, as shown in Figure 3.c.  The dent defect is produced by the surface irregular waviness. The region of the dent defect has a much higher variance value in the LABF system, as shown in Figure 3.c. The discoloration defect is produced by non-uniform screen printing. The region of the discoloration defect has a much higher variance value in the CBF system, as shown in Figure 4.c. The discoloration defect is produced by non-uniform screen printing. The region of the discoloration defect has a much higher variance value in the CBF system, as shown in Figure 4c.  As shown in Figure 5.the typical defect images and the corresponding ground truth annotations are presented. It can be seen that the surface of MPBG contains various defects with an unevenly distributed background because of the various structures. The contrast of the shallow scratch can be very low, and the size of the dent can be very small while the size of the discoloration can be very large. Some non-defective random speckles can also appear due to fluctuations in the production, as sometimes dust and fibers appear on the surface of MPBG due to the production environment not having enough dust-free protection.  Figure 5, the typical defect images and the corresponding ground truth annotations are presented. It can be seen that the surface of MPBG contains various defects with an unevenly distributed background because of the various structures. The contrast of the shallow scratch can be very low, and the size of the dent can be very small while the size of the discoloration can be very large. Some non-defective random speckles can also appear due to fluctuations in the production, as sometimes dust and fibers appear on the surface of MPBG due to the production environment not having enough dust-free protection.

As shown in
annotations are presented. It can be seen that the surface of MPBG contains various defects with an unevenly distributed background because of the various structures. The contrast of the shallow scratch can be very low, and the size of the dent can be very small while the size of the discoloration can be very large. Some non-defective random speckles can also appear due to fluctuations in the production, as sometimes dust and fibers appear on the surface of MPBG due to the production environment not having enough dust-free protection.

Glass Surface Defect Dataset (GSDD)
The image processing is the most important and challenging part of surface defect inspection. The defects are usually darker or brighter than the surrounding background. There can be faker defects like dusts and the image quality can also be affected by non-uniform illumination and a complex texture. The defect detection goal is finding an accurate, efficient, and flexible detection method to meet the production requirements.
Traditional defect detection steps include background correction, contrast enhancement, imaging filtering, morphological operation, segmentation, feature extraction, and classification. All the features and thresholds must be hand-crafted by an experienced engineer. Learning-based classifiers, such as decision tree, SVM, or random forest, are always utilized for defect classification. It is not flexible and versatile enough when the inspection system must be adapted to some different products. The development cycles can also be very long. The classic machine-vision methods are sufficient for some less complicated task, but it is a big challenge for images with an unevenly distributed texture and non-defective random speckles on transition zones, which can essentially increase the misrecognition rate. Deep-learning methods are more powerful than classical defect detection techniques. The image dataset is essentially important for the deep-learning method.
The target of the segmentation task is computing the pixel-wise labels of target images. The original raw image size is 16,000*8092 and the resolution of the image containing glass information is 13,567*6548. It is not practical to directly annotate and train the images with such high resolution. Therefore, 276 images with the size of 600*600 were extracted from the glass image as the training samples, as shown in Figure 6. The neighboring sub-images are cut with a certain overlapping area, and the border region of the original image is extended by mirroring. Sometimes, the size of the defect can be even bigger than that of the overlapping region. When the defects of every sub-image are obtained, the coordinate distribution of the defects on the original image can be easily obtained. The neighboring defects will be merged according to the coordinate distribution, which would reduce the error caused by image cutting. In the practical production, the number of defects in a single raw image is always very small. In order to obtain enough defect images, more than 10,000 glass samples were captured by the CBF system and LABF system. The glass surface defect dataset (GSDD) consists of 34,550 images with 6742 positive samples and 27,808 negative samples, where every positive sample contains at least one defect, and the types of defects mainly consist of a scratch, dent, and discoloration. For every image, a pixel-wise annotation mask is provided by using the LabelMe annotation tool.
The target of the segmentation task is computing the pixel-wise labels of target images. The original raw image size is 16,000*8092 and the resolution of the image containing glass information is 13,567*6548. It is not practical to directly annotate and train the images with such high resolution. Therefore, 276 images with the size of 600*600 were extracted from the glass image as the training samples, as shown in Figure 6. The neighboring sub-images are cut with a certain overlapping area, and the border region of the original image is extended by mirroring. Sometimes, the size of the defect can be even bigger than that of the overlapping region. When the defects of every sub-image are obtained, the coordinate distribution of the defects on the original image can be easily obtained. The neighboring defects will be merged according to the coordinate distribution, which would reduce the error caused by image cutting. In the practical production, the number of defects in a single raw image is always very small. In order to obtain enough defect images, more than 10,000 glass samples were captured by the CBF system and LABF system. The glass surface defect dataset (GSDD) consists of 34,550 images with 6742 positive samples and 27,808 negative samples, where every positive sample contains at least one defect, and the types of defects mainly consist of a scratch, dent, and discoloration. For every image, a pixel-wise annotation mask is provided by using the LabelMe annotation tool.

Segmentation Model Architecture
The proposed architecture of the segmentation network is presented in Figure 7 composed of an encoder and decoder. It is a modified symmetric network based on U-Net [18]. The size of the feature map is more or less symmetrical in the pipeline of the encoder and decoder. The network consists of both 3 × 3 and 5 × 5 convolutional layers, each followed by a rectified linear unit (ReLu) and batch normalization. The 2 × 2 max-pooling layer is utilized for the downsampling instead of convolutions with a large stride, which ensures detailed information survives the downsampling process.
The goal of the network is balancing the detection for all types of defects. Max-pooling downsampling layers and large kernel sizes would significantly increase the receptive field size. However, the downsampling would result in a loss of accurate spatial information. In order to obtain high resolution features, upsampling of the feature map by a 2 × 2 up-convolution is used, which halves the number of feature channels.
The activation function is sigmoid: where s ij is the output logical pixel. The cross-entropy loss function is widely used as follows: where N and N represent the width and height of the input source images, and p ij and y ij denote the example sigmoid regression prediction of the ground truth example annotation. The network can quickly obtain good performance in the relatively uniform regions; however, the segmentation result in the non-uniform edge regions is not so good. Therefore, the weight map based on local variance information is introduced. The region with a more complex texture would have a higher weight value. The local variance V ij of pixel (i,j) on patch P is given by: where region P is centered on (i,j), x donates the pixel grayscale on P, x P is the mean grayscale of patch P, and N p is the pixel number in P. The weight map is introduced by: where b is the bias to balance the variance value. Therefore, the loss function would be the following formula: Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 13

Segmentation Model Architecture
The proposed architecture of the segmentation network is presented in Figure 7.composed of an encoder and decoder. It is a modified symmetric network based on U-Net [18]. The size of the feature map is more or less symmetrical in the pipeline of the encoder and decoder. The network consists of both 3 × 3 and 5 × 5 convolutional layers, each followed by a rectified linear unit (ReLu) and batch normalization. The 2 × 2 max-pooling layer is utilized for the downsampling instead of convolutions with a large stride, which ensures detailed information survives the downsampling process.
The goal of the network is balancing the detection for all types of defects. Max-pooling downsampling layers and large kernel sizes would significantly increase the receptive field size. However, the downsampling would result in a loss of accurate spatial information. In order to obtain high resolution features, upsampling of the feature map by a 2 × 2 up-convolution is used, which halves the number of feature channels. The activation function is sigmoid: where sij is the output logical pixel. The cross-entropy loss function is widely used as follows: where N and N represent the width and height of the input source images, and pij and yij denote the example sigmoid regression prediction of the ground truth example annotation. The network can quickly obtain good performance in the relatively uniform regions; however, the segmentation

Training Setups
The input images and the corresponding ground truth segmentation map were used to train the network with the stochastic gradient descent implementation of tensorflow. The initial learning rate was 0.03, learning rate decay was 0.95 per epoch, momentum was 0.9, and the size of input image was 600 × 600. Two RTX2080Ti GPUs were used for the asynchronous and each GPU had a batch size 4.
The computer was equipped with 64 GB of RAM, and an Intel I9-9900X and tensorflow1.4 framework in an Ubuntu18.04 operating system.

The Defection Detection Results
An illustration of the segmentation results based on the classical machine vision method and proposed deep-learning method is presented in Figure 8. The acquired defect images of the dent, discoloration, and scratch in the green dashed boxes shown in (a), (b), and (c) are respectively the segmentation result samples based on the classical machine vision method and proposed method, and the non-defective speckle misjudged as the defect is shown in the red box. The deep-learning segmentation results are heat maps that denote the probabilities of pixels belonging to defects and the classical segmentation results are the binary images. For the dent defect image of Figure 8(a1) with an evenly distributed background and high variance, the classical method achieved a good performance. There are two scratches in Figure 8(a2), where one scratch is very obvious while the other shallow scratch looks very weak. It is very difficult to detect such a low-contrast defect because the lower threshold may increase the misjudgment in the classical method. Non-defective random speckle can be misjudged as flaws, as shown in Figure 8(a3). The MPBG contains 2.5D arc edges, which would cause an unevenly distributed texture and result in misjudgment in the classical method. For the discoloration defect in Figure 8(a4), only a small part of the defect is detected because the remaining region of the defect has a low variance value. The detection results show that the deep learning method can extract defects in the unevenly distributed texture. The proposed approach outperformed the classical method. The detection results of the three types of defects and background based on the proposed method are shown in Figure 9.Error! Reference source not found. and Figure 10.. The segmentation results of the scratch defects are shown in Figure 9.Error! Reference source not found., where scratches could appear on any part of the glass with different lengths and grayscale, and the weak shallow defects can be easily detected. Dent defect detection is the most challenging task for the classical machine vision method because of the tiny size, as shown in Figure 9.Error! Reference source not found., and another reason is the presence of tiny dust particles, which are point-like in shape just as some dent defects. The proposed method could essentially reduce the misjudgment rate of the dust particles. The characteristic of discoloration is that it shows various shapes and patterns, including a point shape, curve shape, or irregular shape, as shown in Figure 10.. The experimental results show the proposed method can automatically extract higher-level features to detect such defects. The results of the background in Figure 10. demonstrate that even though different parts of the glass have different backgrounds and structures, the false positive number can be small. The detection results of the three types of defects and background based on the proposed method are shown in Figures 9 and 10. The segmentation results of the scratch defects are shown in Figure 9, where scratches could appear on any part of the glass with different lengths and grayscale, and the weak shallow defects can be easily detected. Dent defect detection is the most challenging task for the classical machine vision method because of the tiny size, as shown in Figure 9, and another reason is the presence of tiny dust particles, which are point-like in shape just as some dent defects. The proposed method could essentially reduce the misjudgment rate of the dust particles. The characteristic of discoloration is that it shows various shapes and patterns, including a point shape, curve shape, or irregular shape, as shown in Figure 10. The experimental results show the proposed method can automatically extract higher-level features to detect such defects. The results of the background in classical machine vision method because of the tiny size, as shown in Figure 9.Error! Reference source not found., and another reason is the presence of tiny dust particles, which are point-like in shape just as some dent defects. The proposed method could essentially reduce the misjudgment rate of the dust particles. The characteristic of discoloration is that it shows various shapes and patterns, including a point shape, curve shape, or irregular shape, as shown in Figure 10.. The experimental results show the proposed method can automatically extract higher-level features to detect such defects. The results of the background in Figure 10. demonstrate that even though different parts of the glass have different backgrounds and structures, the false positive number can be small.  The images containing defects are defined as positive samples and the non-defective images are the negative images. The precision rate and recall rate are the most important evaluation criterions. They can represent the performance more accurately than the area under the curve (AUC) because of the large number of non-defective samples in GSDD. The precision P is defined as: where the true positive (TP) means the number of correctly detected defect regions, and the false positive (FP) means the number of the background regions that are wrongly detected as defects.
Recall is defined as: where the false negative (FN) denotes the number of undetected defect regions. Industrial production always wants the qualification rate to be as high as possible after production inspection, which means strict inspection criterion to make the false negative number small. However, strict inspection criterion may cause a high false positive, which results in low productivity and high The images containing defects are defined as positive samples and the non-defective images are the negative images. The precision rate and recall rate are the most important evaluation criterions. They can represent the performance more accurately than the area under the curve (AUC) because of the large number of non-defective samples in GSDD. The precision P is defined as: where the true positive (TP) means the number of correctly detected defect regions, and the false positive (FP) means the number of the background regions that are wrongly detected as defects.
Recall is defined as: where the false negative (FN) denotes the number of undetected defect regions. Industrial production always wants the qualification rate to be as high as possible after production inspection, which means strict inspection criterion to make the false negative number small. However, strict inspection criterion may cause a high false positive, which results in low productivity and high costs. A good defection detect system would result in high precision and high recall. The deep-learning method proposed in this paper is superior to the traditional method. The traditional segmentation method is mainly composed of background correction, image filtering, and morphological operations [29]. Verified by the test experiment with more than 10,000 image samples, the inspection results of the traditional machine vision method and proposed deep learning method are shown in Table 1. The performance of the proposed method is significantly better with higher precision and recall on the defects of dent, scratch, and discoloration. Meanwhile, the average precision of the traditional method is 85.2%, while the proposed deep learning method achieves a precision of 91.8%. The average recall of the proposed method is 4.6% higher than that of the traditional method. The inspection performance on discoloration is obviously better than other defects because the size of discoloration is relatively large and the variance value is higher. The dent defect has the lowest recall because of the inevitable dust particles. The detection result is also better than the manual inspection result.

Conclusions
A novel inspection system for screen printing mobile phone back glass (MPBG) was proposed in this paper. High-quality images of MPBG were captured based on CBF and LABF line-scanning imaging systems. A modified segmentation deep convolutional network was constructed to detect the surface defects of MPBG. The network structure is partly symmetric. The performance of the proposed method was achieved by training from a glass surface defect dataset (GSDD), which was composed of 34,550 image samples. Verified by the test experiment, the average precision and recall of all kinds of defects are respectively more than 91% and 95%. The performance of the proposed method is significantly better than that of the traditional method. This paper demonstrated that the performance of the inspection system can satisfy the requirements of defect detection of a specific task (MPBG), and the system also shows great potential for other surface inspection tasks without much modification. For the future work, we will focus on achieving a good detection performance with less defect samples and on improving the computational efficiency. Meanwhile, how to annotate defect images with a higher precision and efficiency is also our destination.