1. Introduction
Intellisense and pattern recognition technologies have made progress in robotics [
1,
2,
3], computer engineering [
4,
5], health-related issues [
6], natural sciences [
7] and industrial academic areas [
8,
9]. Among them, computer vision technology develops particularly quickly. It mainly uses a binary camera, digital camera, depth camera and charge-coupled device (CCD) camera to collect target images, extract features and establish corresponding mathematical models, and to complete the processing of target recognition, tracking and measurement. For example, Kamal et al. comprehensively consider the continuity and constraints of human motion. After contour extraction of the acquired depth image data, the Hidden Markov Model (HMM) is used to identify human activity. This system is highly accurate in recognition and has the ability to effectively deal with rotation and deficiency of the body [
10]. Jalal et al. use Texture and shape vectors to reduce feature vectors and extracts important features in facial recognition through density matching score and boundary fixation, so as to manage key processing steps of face activity (recognition accuracy, recognition speed and security) [
11]. In [
12], vehicle damage is classified by a deep learning method, and the recognition accuracy of a small data set was up to 89.5% by the introduction of transfer learning and an integrated learning method. This provides a new way for automatic processing of vehicle insurance. Zhang et al. combine the four features of color, time motion, gradient norm and residual motion to identify the position of each frame in video. The method uses weighted linear combination to evaluate the different combinations of these features and establishes a precise hand detector [
13]. With the continuous improvement of computer hardware and the deepening of research on complex image classification, the application prospect of computer vision technology will be more and more extensive.
Surface defect detection is an important issue in modern industry. Traditionally, surface defects are often detected in the following steps: first, pre-processing of the target image by image processing algorithms. Image pre-processing technology can process pixels accurately. By setting and adjusting various parameters according to actual requirements, the image quality can be improved by de-noising, changing brightness and improving contrast, laying a foundation for subsequent processing; second, carry out histogram analysis, wavelet transform or Fourier transform. The above transformation methods can obtain the representation of an image in a specific space, which is convenient for the artificial designing and extracting feature; finally, the image is classified according to its features using a classifier. Common methods include thresholding, decision trees or support vector machine (SVM). Most of the existing surface defect detection algorithms are based on machine vision [
14,
15,
16,
17,
18,
19]. Considering the mirror feature of ceramic balls, [
17] obtains the stripe distortion image of defective parts according to the principle of the fringe reflection and locates the defect positions by reverse ray tracing. The research method is suitable for surface defect detection of ceramic balls and other phases, but fails to achieve high accuracy due to the selection and design of radiographic models in reverse ray tracing. Jian et al. realize the automatic detection of glass surface defects on cell phone screens through fuzzy C-means clustering. Specifically, the image was aligned by contour registration during pre-processing, and then the defective area was segmented by projection segmentation. Despite the high accuracy, the detection approach consumes way too much time (1.6601 s) [
18]. Win et al. integrate a median-based Otsu image thresholding algorithm with contrast adjustment to achieve automatic detection of the surface defects on titanium coating. The proposed method is simple and, to some extent, immune to variation in light and contrast. However, when the sample size is large, the optimal threshold calculation is too inefficient and the grey information is easily contaminated by dry noise points [
19]. To sum up, the above surface detection methods can only extract a single feature, and derive a comprehensive description of surface defects from it. These types of approaches only work well on small sample datasets, but not on large samples and complex objects and backgrounds in actual production. To solve this problem, one viable option is to improve the approach with deep learning.
In recent years, deep learning has been successfully applied to image classification, speech recognition and natural language processing [
20,
21,
22]. Compared with the traditional machine learning method, it has the following characteristics: deep learning can simplify or even omit the pre-processing of data, and directly use the original data for model training; deep learning is composed of multi-layer neural networks, which solve the defects in the traditional machine learning methods of artificial feature extraction and optimization. So far, deep learning has been extensively adopted for surface defect detection. For example, in [
23], Deep Belief Network (DBN) was adopted to obtain the mapping relationship between training images of solar cells and non-defect templates, and the comparison between reconstructed images and defect images was used to complete the defect detection of the test images. Cha et al. employ a deep convolution neural network (CNN) to identify concrete cracks in complex situations, e.g. strong spots, shadows and ultra-thin cracks, and proves that the deep CNN outperformed the traditional tools like Canny edge detector and Sobel edge detector [
24]. Han et al. detect various types of defects on hub surfaces with residual network (ResNet)-101 as the base net and the faster region-based CNN (Faster R-CNN) as the detector and achieves a high mean average precision of 86.3% [
25]. The above studies fully verify the excellent performance of deep learning in detecting surface defects. Nevertheless, there are few studies on product surface defect detection using several main target detection networks in recent years, such as YOLO (You Only Look Once), SSD (Single Shot MultiBox Detector) [
26] and so on. The detection performance of these networks in surface defect detection needs to be further verified and optimized.
This paper presents a surface defect detection method based on MobileNet-SSD. By optimizing the network structure and parameters, this method can meet the requirements of real-time and accuracy in actual production. It was verified in the filling line and the results show that our method can automatically locate and classify the defects on the surface of the products.
3. Experimental Results and Analysis
Our experiment targets an oil chili filling production line in China’s Guizhou Province. During the detection, the image of the sealing surface was transmitted via the image acquisition unit to the host for image signal processing. Then, the corresponding features were extracted and the defects were detected and marked by the MobileNet-SSD network. Specifically, the MobileNet-SSD served as the training base net of the pre-processed database; then, the trained model was migrated to the detection network for boundary box regression and classification regression. A total of five width-to-height ratios were selected according to the defect size, namely, 1, 2, 3, 0.5 and 0.33 respectively. The was set to 0.95 and the to 0.2. The six layers of the pyramid respectively contain 4, 6, 6, 6, 6 and 6 default boxes. During the training, the IoU of the positive sample fell in (0.5, 1), that of the negative sample fell in (0.2, 0.5), and the difficulty of the sample fell in (0, 0.2). In addition, the learning rate of this paper is the exponential decay learning rate initialized to 0.1, and random initialization weights and bias terms.
3.1. Image Processing Unit
As shown in
Figure 6, the image acquisition unit consists of a transmission mechanism, a proximity switch sensor, an industrial CCD camera, a light-emitting diode (LED) arc light source and a lens. Before the acquisition, the LED arc light source was adjusted to calibrate the brightness, and the lens was mounted on the CCD camera. Then, the aperture and focal length were adjusted to ensure the imaging quality of the acquisition unit. Under the action of the transmission mechanism, the sensor detected the workpiece and produced pulse signals when the vessel maintained a constant speed and spacing. These signals triggered the CCD camera to take photos. In order to ensure that the container runs to the center of view field, sensor needs to be accurately debugged.
The detection network was trained on the following hardware: Intel Core i7 7700K processor (Vietnam, 2017) which has a main frequency of 4.2 GHz, 32 GB memory and a GeForce TITAN X graphics processing unit (GPU). The software part used the Ubuntu 14.04.2 operating system, and the Tensorflow deep learning framework. Twenty percent of the samples in the pre-processed library were allocated to the test set and the other 80% to the training set.
3.2. Comparison of Three Deep Learning Networks
The loss function and the accuracy of the proposed MobileNet-SSD surface defect detection algorithm on the test set (
Figure 7) were compared to those of the VGGNet [
28], an excellent detection network in 2014 ImageNet and MobileNet. The three algorithms were trained via migration learning and data enhancement. The training parameters and results are shown in
Table 3.
In the training process, the detection networks were tested once after two hundred iterations of the training set. The loss function and accuracy in
Table 2 were mean values obtained from 40 to 50 iterations of the test set. It is clear that the MobileNet-SSD detection algorithm achieved better accuracy than the other two networks with fewer network parameters.
3.3. Results of Defect Detection Network
The trained network parameters were adopted for the MobileNet-SSD defect detection network. The test set image contained four different types of defect samples, each of which had 30 images obtained through resampling. Each sample involved one or more defects. The detection results of the trained MobileNet-SSD defect detection network on the four kinds of defect samples are show in
Table 4.
It can be seen from
Table 3 that the surface defect detection network completes the defect marking of 120 defect samples with a 95.00% accuracy rate. There were missing and false samples in dent and burr defects and missing samples in abrasion defects. This is because the notches are more obvious than the other defects, and related to the image quality and subjective feelings of humans.
When the filling line was in operation, the container passed by the image acquisition device within a certain distance, triggering the CCD camera to take photos. The defect detection network performed forward operation. If there were defects in the image, the alarm would buzz and the defect type and location were identified by the host (
Figure 8). The single forward operation of the network was at the rate of 0.12 s/image.
3.4. Degree of Defect Detection
The defects of the same type may differ in terms of severity. Here, the pre-processed datasets were divided into three categories based on the defect severity: easy, medium and hard. The recognition result can serve as a yardstick of the network classification quality. Seventy percent of all samples were divided into the training set and the remaining 30% to the test set. The detection results of the breaches are shown in
Figure 9.
Figure 10 shows the precision–recall (PR) curves of the experimental dataset. The advanced multi-task CNN (MTCNN) [
29] and Faceness-Net [
30] were contrasted with the proposed MobileNet-SSD algorithm (“MS” in the figure) on the experimental dataset. The experimental results show that the recall rates of the proposed algorithm were 93.11%, 92.18% and 82.97%, in easy, medium and hard subsets, respectively, and its performance was better than that of the contrastive algorithms.
3.5. Contrast Experiment
Three comparative experiments were designed to further validate the proposed algorithm. In the first experiment, the proposed algorithm was compared to five lightweight feature extraction networks, including SqueezeNet [
31], MobileNet, performance vs accuracy net (PVANet) [
32], MTCNN and Faceness-Net. The feature extraction accuracy of each algorithm for the ImageNet classification task is displayed in
Table 5. In the second experiment, the above five networks were contrasted with the proposed algorithm in defect detection of the filling line in terms of correct detection rate, training time and the detection time per image (
Table 6).
As shown in the two tables, the MobileNet-SSD surface defect model is fast and stable, thanks to the improved SSD meta-structure of the feature pyramid. In general, the proposed algorithm outperformed the contrastive algorithms in detection rate, training time and detection time. The final detection time of our algorithm was merely 120 milliseconds per piece, which meets the real-time requirements of the industrial filling line.
In Contrast Experiment 3, four traditional defect recognition methods of k-nearest neighbor (KNN) [
33], HMM [
34,
35,
36], SVM and HMM [
37] and back propagation neural network (BPNN) [
38] are realized, which are compared with the method in this paper. The KNN method selects Euclidean distance as the distance function; the HMM model adopts a sampling window of 5 × 4 size and uses the discrete cosine transform (DCT) coefficient as the observation vector of HMM. The SVM and HMM method is the same as in literature [
37]. The hidden layer number of the BP neural network is set to 30. The above models are also applied to detect the defects on the sealing surface of a container in the filling line. The statistical results are shown in
Table 7.
As can be seen from
Table 6, compared with the other traditional defect detection methods, the MobileNet-SSD method has a higher positive detection rate. Under the same hardware conditions, MobileNet-SSD still maintains the optimal speed despite the small differences between the above five methods. In addition, the results of HMM and KNN are not ideal. The reason for this may be that the proportion of defects is small, and the sealing surface of a container contains a lot of background information. KNN and HMM did not extract specific features of the image before classifying. However, both the BP neural network and MobileNet-SSD are based on neural networks, which can automatically learn features by itself, so the accuracy rate of the two methods are relatively high. MobileNet-SSD, due to its unique deep convolutional structure, can learn the deep and detachable features of defects with a bigger receptive field, so it can achieve a higher positive detection rate.
4. Conclusions
This paper proposes a surface defect detection method based on the MobileNet-SSD network, and applies it to identify the types and locations of surface defects. In the pre-processing phase, a regional planning method was presented to cut out the main body of the defect, reduce redundant parameters and improve detection speed and accuracy. Meanwhile, the robustness of the algorithm was elevated by data enhancement. The philosophy of MobileNet, a lightweight network, was introduced to enhance the detection accuracy, reduce the computing load and shorten the training time of this algorithm. The MobileNet and SSD were adjusted to detect the surface defects, such that the proposed method could differentiate small defects from the background. The feasibility of the proposed method was verified by defect detection for the sealing surface of an oil chili filling production line in Guizhou, China. Specifically, an image acquisition device was established for the sealing surface and the deep learning framework was adopted to mark the defect positions. The results show that the proposed method can identify most defects in the production environment at high speed with accuracy. However, the system also has its limitations. Deep learning models have a certain dependence on the hardware platform because of computationally intensive processes, and they are not suitable for embedded systems with general performance. Future research will further improve the proposed method through integration with embedded chips and the Internet of Things, balancing the classification accuracy and number of parameters of the detection method, and expand the application scope of our method to complex defects in industrial processes.