Defect Detection in Printed Circuit Boards Using You-Only-Look-Once Convolutional Neural Networks

: In this study, a deep learning algorithm based on the you-only-look-once (YOLO) approach is proposed for the quality inspection of printed circuit boards (PCBs). The high accuracy and e ﬃ ciency of deep learning algorithms has resulted in their increased adoption in every ﬁeld. Similarly, accurate detection of defects in PCBs by using deep learning algorithms, such as convolutional neural networks (CNNs), has garnered considerable attention. In the proposed method, highly skilled quality inspection engineers ﬁrst use an interface to record and label defective PCBs. The data are then used to train a YOLO / CNN model to detect defects in PCBs. In this study, 11,000 images and a network of 24 convolutional layers and 2 fully connected layers were used. The proposed model achieved a defect detection accuracy of 98.79% in PCBs with a batch size of 32.


Introduction
Printed circuit boards (PCBs) are a crucial component in most electronic devices. For decades, PCBs have been adapted for industrial and domestic use. PCBs are the primary component of any electronic design and have been used in fields such as logistics, defence, and aeronautics as well as for applications in the automobile and medical industries, among others. PCBs are solid thin plates prepared from laminated materials, fibreglass, or composite epoxy and form a physical base that supports chips and electronic components [1][2][3][4]. These boards are designed with conductive pathways, which form circuits and power electronic devices attached to the PCBs. Therefore, PCB inspection processes have continually improved to meet the ever-increasing demands of modern manufacturing. Production and manufacturing industries have attempted to achieve 100% quality assurance for all PCBs. Automated visual inspection of PCBs has advanced considerably [5,6]. Studies [7] have revealed that deep learning outperforms traditional machine-based classification and feature extraction algorithms. Defect detection in PCBs during quality inspection is critical. In the conventional method, defects are initially detected by an automatic inspection (AOI) machine. A skilled quality inspection engineer then verifies each PCB. Many boards classified as defective by the AOI machine may not be defective. The machine can erroneously classify a PCB as defective because of a scratch or small hole or the presence of nanoparticles such as dust, paper fragments, or small air bubbles. Slight variation from the reference sample may result in the AOI machine classifying PCBs as defective. However, reliance on be fast. Most object detection or classification frameworks use VGG-16 as the basic feature extractor algorithm [19] because of its high accuracy and robustness. However, VGG-16 is highly complex and requires 30.69 billion floating point operations per second (FLOP) for a single pass over to achieve an image resolution of 224 × 224. A custom network based on the Google Net architecture is typically preferable for YOLO frameworks [20]. This model is faster than VGG-16 and requires only 8.52 billion FLOP for a forward pass. However, its accuracy is slightly lower than that of VGG-16. Therefore, on the basis of these parameter comparisons, we used Tiny-YOLO-v2 (a modified and compact version of YOLO).
Using Microsoft visual studio, we developed an interface to collect images from the AOI machine. The interface enables the quality inspection engineer to label the defective region on individual PCBs. Deep learning is then applied on the gathered image data. This study implemented the Tiny-YOLO-v2 model with a CNN to improve the accuracy and reduce the error rate.

PCB Data Set
The PCB data set was obtained from the AOI machine to generate an image of the reference PCB sample in RGB format, which was then converted into JPEG format and saved. In addition to images of the reference sample, images of defective samples were also collected using the AOI machine. The images were extracted and cropped. Almost 11,000 images were collected for training. An interface was developed (as shown in Figure 1) to enable quality inspection engineers to label the defective regions of PCBs, which were then compared with the same location on the reference sample.
Electronics 2020, 9, x FOR PEER REVIEW 3 of 17 be fast. Most object detection or classification frameworks use VGG-16 as the basic feature extractor algorithm [19] because of its high accuracy and robustness. However, VGG-16 is highly complex and requires 30.69 billion floating point operations per second (FLOP) for a single pass over to achieve an image resolution of 224 × 224. A custom network based on the Google Net architecture is typically preferable for YOLO frameworks [20]. This model is faster than VGG-16 and requires only 8.52 billion FLOP for a forward pass. However, its accuracy is slightly lower than that of VGG-16. Therefore, on the basis of these parameter comparisons, we used Tiny-YOLO-v2 (a modified and compact version of YOLO). Using Microsoft visual studio, we developed an interface to collect images from the AOI machine. The interface enables the quality inspection engineer to label the defective region on individual PCBs. Deep learning is then applied on the gathered image data. This study implemented the Tiny-YOLO-v2 model with a CNN to improve the accuracy and reduce the error rate.

PCB Data Set
The PCB data set was obtained from the AOI machine to generate an image of the reference PCB sample in RGB format, which was then converted into JPEG format and saved. In addition to images of the reference sample, images of defective samples were also collected using the AOI machine. The images were extracted and cropped. Almost 11,000 images were collected for training. An interface was developed (as shown in Figure 1) to enable quality inspection engineers to label the defective regions of PCBs, which were then compared with the same location on the reference sample.   Figure 2b is an enlarged image of the defective region, and Figure 2c displays the defective region and the same region on the reference sample. The images were 420 × 420 pixels and were placed adjacent to each other to enable a quality inspection engineer to compare the defective and reference samples. The inspection engineer then labelled the defective area in the image. Images of 11 defect types were collected. However, because of the small amount of data, all the defect types were labelled as a single type of defect.   Figure 2b is an enlarged image of the defective region, and Figure 2c displays the defective region and the same region on the reference sample. The images were 420 × 420 pixels and were placed adjacent to each other to enable a quality inspection engineer to compare the defective and reference samples. The inspection engineer then labelled the defective area in the image. Images of 11 defect types were collected. However, because of the small amount of data, all the defect types were labelled as a single type of defect.

Architecture of Tiny-YOLO-v2
The unique object detection algorithm [21] developed using Tiny-YOLO-v2 [22,23] is explained in this section. For a single image, Tiny-YOLO-v2 predicts several bounding boxes with the class probability by using a single CNN. Figure 3 displays the structure of Tiny-YOLO-v2, which includes convolutional layers, various activation function blocks-such as a rectified linear unit (ReLU) and batch normalisation, within the region layer-and six max pooling layers. An image of 416 × 416 pixels is used as the input for the classifier. The output is a (125 × 13 × 13) tensor with 13 × 13 grid cells. Each grid cell corresponds to 125 channels, consisting of five bounding boxes predicted by the grid cell and the 25 data elements that describe each bounding box (5 × 25 = 125).

Architecture of Tiny-YOLO-v2
The unique object detection algorithm [21] developed using Tiny-YOLO-v2 [22,23] is explained in this section. For a single image, Tiny-YOLO-v2 predicts several bounding boxes with the class probability by using a single CNN. Figure 3 displays the structure of Tiny-YOLO-v2, which includes convolutional layers, various activation function blocks-such as a rectified linear unit (ReLU) and batch normalisation, within the region layer-and six max pooling layers. An image of 416 × 416 pixels is used as the input for the classifier. The output is a (125 × 13 × 13) tensor with 13 × 13 grid cells. Each grid cell corresponds to 125 channels, consisting of five bounding boxes predicted by the grid cell and the 25 data elements that describe each bounding box (5 × 25 = 125).

Architecture of Tiny-YOLO-v2
The unique object detection algorithm [21] developed using Tiny-YOLO-v2 [22,23] is explained in this section. For a single image, Tiny-YOLO-v2 predicts several bounding boxes with the class probability by using a single CNN. Figure 3 displays the structure of Tiny-YOLO-v2, which includes convolutional layers, various activation function blocks-such as a rectified linear unit (ReLU) and batch normalisation, within the region layer-and six max pooling layers. An image of 416 × 416 pixels is used as the input for the classifier. The output is a (125 × 13 × 13) tensor with 13 × 13 grid cells. Each grid cell corresponds to 125 channels, consisting of five bounding boxes predicted by the grid cell and the 25 data elements that describe each bounding box (5 × 25 = 125).

Convolutional Layer
The convolutional layer in Tiny-YOLO-v2 occupies 90% of the feed-forward computation time [24]. Therefore, the performance of the classifier is improved through the optimisation of the convolutional layer. Hundreds of millions of addition and multiplication operations are performed between the local regions and filters for a single image. The function is presented as follows: where X (i) = output pixel feature, X ( j) = input pixel feature, W (k) = convolution weights, and b = convolution bias. The number of functions involved in the convolution layer is calculated according to Equation (2). The number of operations for batch normalisation and the leaky activation function for each layer are ignored in this equation. (2) where N im = the number of channels of the input feature, K = the filter size; N out = the number of filters; H out = the output feature height; W out = the output feature width. The required memory is a challenge because of the paucity of space. The weight used in the convolutional layer is the primary parameter in the Tiny-YOLO-v2. Equation (3) expresses the weights involved in the convolutional layer: where N in = the number of channels for the input feature, K = the filter size, and N out = the number of filters. Approximately 7 billion operations and 15 million weights are simultaneously inputted in Tiny-YOLO-v2 for an input image in Pascal VOC. Furthermore, 5.7 billion operations with 12 million weights are inputted in Tiny-YOLO-v2 for a single input image in the COCO data set.

Activation Function
In the CNN architecture, the activation function is used to correct the input. Sigmoidal activation functions are the most used activation function and are limited to the maximum and minimum values; thus, they lead saturated neurons to higher layers of the neural network. The leaky ReLU function updates weights, which are never reactivated on any data point, as shown in Figure 4.

Convolutional Layer
The convolutional layer in Tiny-YOLO-v2 occupies 90% of the feed-forward computation time [24]. Therefore, the performance of the classifier is improved through the optimisation of the convolutional layer. Hundreds of millions of addition and multiplication operations are performed between the local regions and filters for a single image. The function is presented as follows: where ( ) = output pixel feature, ( ) = input pixel feature, ( ) = convolution weights, and b = convolution bias. The number of functions involved in the convolution layer is calculated according to Equation (2). The number of operations for batch normalisation and the leaky activation function for each layer are ignored in this equation.
where = the number of channels of the input feature, K = the filter size; = the number of filters; = the output feature height; = the output feature width. The required memory is a challenge because of the paucity of space. The weight used in the convolutional layer is the primary parameter in the Tiny-YOLO-v2. Equation (3) expresses the weights involved in the convolutional layer: where = the number of channels for the input feature, K = the filter size, and = the number of filters. Approximately 7 billion operations and 15 million weights are simultaneously inputted in Tiny-YOLO-v2 for an input image in Pascal VOC. Furthermore, 5.7 billion operations with 12 million weights are inputted in Tiny-YOLO-v2 for a single input image in the COCO data set.

Activation Function
In the CNN architecture, the activation function is used to correct the input. Sigmoidal activation functions are the most used activation function and are limited to the maximum and minimum values; thus, they lead saturated neurons to higher layers of the neural network. The leaky ReLU function updates weights, which are never reactivated on any data point, as shown in Figure 4.

Pooling Layer
The pooling layer is used to reduce the dimensions of images. The main objective of the pooling layer is to eliminate unnecessary information and preserve only vital parameters. Often used are maximum pooling, which considers the maximum value from the input, and average pooling, which considers the average value, expressed as follows:

Pooling Layer
The pooling layer is used to reduce the dimensions of images. The main objective of the pooling layer is to eliminate unnecessary information and preserve only vital parameters. Often used Electronics 2020, 9, 1547 6 of 16 are maximum pooling, which considers the maximum value from the input, and average pooling, which considers the average value, expressed as follows: Layers in Tiny-YOLO-v2 can be accessed after the application of the batch normalisation layer to convolutional layers. Inputs with zero mean and unit variance are used. Batch normalisation is expressed in Equation (6). The earlier output of the convolutional layer is normalised through removal of the batch mean and dissection of the batch variance. The output after batch normalisation is shifted according to bias and scaled. CNNs consist of variables such as variance, mean, scale, and bias caused during the CNN training stage. These parameters permit individual layers to learn independently and prevent overfitting through their regularisation effect.

Results
Efficient computing is necessary to achieve optimal performance of the Tiny-YOLO-v2 model. Therefore, a NVidia TITAN V graphics processing unit (GPU) (NVidia, Santa Clara, CA, USA) was used for this experiment, which reduced the training time to 10% (i.e., 34 to 4 h). The Tiny-YOLO-v2 model was trained with the Keras framework running on the NVidia TITAN V GPU by using the Linux operating system. The early stop criteria were implemented, which achieved the highest validation accuracy. A batch size of 32 was used, which is a standard maximum batch size. Here, 8 GB of memory was used. Initially, during the training stage, a small data set was selected for testing the basic performance of these networks. The network settings and parameters were adjusted and tuned gradually using the trial-and-error method. The parameter batch size was changed. Initially, 5000 images of PCBs labelled as defective were used. Next, 11,000 images of defective PCBs were used. This procedure was implemented to improve the accuracy of the Tiny-YOLO-v2 model and regulate the parameters of the model to attain the most advantageous implementation of the training model. The epoch size is based on the training data set. After parameters were selected for the model, an initial ideal start for training was initiated. Moreover, with the callback function, a model checkpoint instruction was used to tune the training process. Its primary purpose is to save the Tiny-YOLO-v2 model with all the weight after each epoch so that finally model framework and weights can save its optimal performance. Fivefold cross validation [25] was used to evaluate the execution of the trained models. Initially, the data were randomly divided into five equal segments, four of which were used for the training model, and the remaining segment was used for testing. After every training phase, the model evaluated the remaining segment. This procedure was performed five times with different testing and training data sets. After completing the training process, each model was tested on a different data segment. Figure 5  The results gradually improved as the batch size increased, which proved that the Tiny-YOLO-v2 model is more efficient than other CNN models. After every epoch or iteration, the accuracy of the training process increased, which eventually improved the performance of the model.  The results gradually improved as the batch size increased, which proved that the Tiny-YOLO-v2 model is more efficient than other CNN models. After every epoch or iteration, the accuracy of the training process increased, which eventually improved the performance of the model. The final model was saved after the accuracy stabilised. The results of fivefold cross validation with batch sizes of 8, 16 and 32 are displayed in Tables 1-3, respectively. Figure 6 displays a sample detected as false negative. The PCB was defective and incorrectly classified. Data of 11 types of defects were collected and labelled. The number of images for each defect type was not equal. Therefore, the displayed sample images display the types of defects that were noted least often, and the remaining types of defects are compared. Figure 7 displays images of a false positive case. The model misclassified the sample and exhibited low confidence. To avoid such misclassification, the size of the bounding box in the training data should be examined. Figure 8 displays sample images for true positive detections in which defects were detected by the model with confidence. Figure 9 displays sample images for true negative detection. These samples did not have any defect and were classified as OK. The model achieved accurate defect detection. The average accuracy in detecting defective PCBs for a batch size of 32 was 98.79%, and evaluation precision was consistently 0.99 (Table 3). In addition, other parameters such as the misclassification rate, true positive rate, false positive rate, true negative rate, and prevalence for a batch size of 32 were favourable to those for batch sizes of 8 and 16. In most machine learning algorithms, a large and balanced data set is crucial for achieving optimal performance.    Figure 6 displays a sample detected as false negative. The PCB was defective and incorrectly classified. Data of 11 types of defects were collected and labelled. The number of images for each defect type was not equal. Therefore, the displayed sample images display the types of defects that were noted least often, and the remaining types of defects are compared.        Figure 9 displays sample images for true negative detection. These samples did not have any defect and were classified as OK. The model achieved accurate defect detection. The average accuracy in detecting defective PCBs for a batch size of 32 was 98.79%, and evaluation precision was consistently 0.99 (Table 3). In addition, other parameters such as the misclassification rate, true positive rate, false positive rate, true negative rate, and prevalence for a batch size of 32 were favourable to those for batch sizes of 8 and 16. In most machine learning algorithms, a large and balanced data set is crucial for achieving optimal performance.    (Table 3). In addition, other parameters such as the misclassification rate, true positive rate, false positive rate, true negative rate, and prevalence for a batch size of 32 were favourable to those for batch sizes of 8 and 16. In most machine learning algorithms, a large and balanced data set is crucial for achieving optimal performance. A vanilla version of the CNN [26] was used to compare the results of the proposed model. The vanilla CNN was trained using 15,823 images. Fivefold cross validation was implemented to evaluate the execution of the trained models. Figure 10 displays the results of testing in the form of a confusion matrix ( Figure 5); green cells represent true positives and true negatives, and red cells represent false positives and false negatives. The cross validation results of the Vanilla CNN are presented in Table 4. The mean and standard deviation of accuracy was approximately 81.53 ± 2.326%, and precision was less than 0.8, which is less than that of Tiny-YOLO-v2. The vanilla CNN is an image classifier that could not identify the location of defects. Furthermore, it considers the background as a class, which increased misclassification, unlike Tiny-YOLO-v2, which detects multiple defects in a single image and locates the defects with a bounding box. A vanilla version of the CNN [26] was used to compare the results of the proposed model. The vanilla CNN was trained using 15,823 images. Fivefold cross validation was implemented to evaluate the execution of the trained models. Figure 10   The cross validation results of the Vanilla CNN are presented in Table 4. The mean and standard deviation of accuracy was approximately 81.53 ± 2.326%, and precision was less than 0.8, which is less than that of Tiny-YOLO-v2. The vanilla CNN is an image classifier that could not identify the location of defects. Furthermore, it considers the background as a class, which increased misclassification, unlike Tiny-YOLO-v2, which detects multiple defects in a single image and locates the defects with a bounding box. The cross validation results of the Vanilla CNN are presented in Table 4. The mean and standard deviation of accuracy was approximately 81.53 ± 2.326%, and precision was less than 0.8, which is less than that of Tiny-YOLO-v2. The vanilla CNN is an image classifier that could not identify the location of defects. Furthermore, it considers the background as a class, which increased misclassification, unlike Tiny-YOLO-v2, which detects multiple defects in a single image and locates the defects with a bounding box.

Discussion
A novel user interface was developed to collect data. Skilled engineers labelled the defects using the interface. The simple method achieved an excellent PCB defect detection accuracy of 98.79%, which is considerably better than that of other algorithms involving complex feature extraction [27][28][29]. The effectiveness of the proposed method was investigated using various CNN layers. To avoid the delay resulting from the nonspecificity of a single CNN model and insufficient storage capacity, a GPU environment was established. The model was trained using different batch sizes to improve accuracy.
The YOLO strategy is a powerful and quick approach that achieves a higher FPS rate than computationally expensive two-stage detectors (e.g., faster R-CNN) and some single-stage detectors (e.g., RetinaNet and SSD). Tiny-YOLO-v2 was used in this study to increase the execution speed because it is approximately 442% faster than the standard YOLO model, achieving 244 FLOP on a single GPU. A small model size (<50 MB) and fast inference renders the Tiny-YOLO-v2 naturally suitable for embedded computer vision.
Compared with other simple classifiers, YOLO is widely used in practice. It is a simple unified object detection model and can be trained directly using full images. Unlike classifier-based approaches, YOLO is trained using a loss function that directly influences detection performance. Furthermore, the entire model is trained at one time. Fast YOLO is the fastest general-purpose object detector. YOLO-v2 provides the best tradeoff between real-time speed and accuracy for object detection compared with other detection systems across various detection data sets.
Multilayer neural networks or tree-based algorithms are deemed insufficient for modern advanced computer vision tasks. The disadvantage of fully connected layers, in which each perceptron is connected to every other perceptron, is that the number of parameters can increase considerably, which results in redundancy in such high dimensions, rendering the system inefficient.
Another disadvantage is that spatial information is disregarded because flattened vectors are used as inputs. However, the key difference distinguishing YOLO from other methods is that the complete image is viewed at one time rather than only a generated region, and this contextual information helps to avoid false positives.
In this study, a modified YOLO model was developed and combined with an advanced CNN architecture. However, some challenges remain to be overcome. Although the proposed system incorporates a smart structure and an advanced CNN method, the model accuracy is not perfect, particularly when operated with unbalanced data sets. The used data sets are collected by experienced PCB quality inspection teams. Moreover, traditional deep learning methods [30][31][32] are based on classifying or detecting particular objects in an image. Therefore, the model structure was tested with three batch sizes. As elaborated elsewhere [33], these structures exhibit high precision for a classical CNN model but unpredictable performance for the PCB data set. According to research on deep learning, the types of network layer parameters, linear unit activation parameters, regularisation strategies, optimisation algorithms, and approach to batch normalisation of the CNN training process should be focused on for improving the PCB defect detection performance of CNNs. As depicted in Figure 5, the proposed model can accurately detect defects on PCBs and can therefore be used for PCB quality inspection on a commercial scale.
The three models achieved excellent results with an accuracy reaching 99.21% (Table 3). This proves that the modified YOLO model with deep CNNs is suitable for detecting PCB defects and can achieve accurate results. CNNs can automatically perform the learning process of the target after appropriate tuning of the model parameters. During training, the CNN weights are automatically fine-tuned to extract features from the images. However, further research, such as experimental evaluation and performance analysis, should be conducted to enhance CNN performance, describe PCB defects in more detail, and classify defects into predefined categories.

Conclusions
This study proves that the proposed CNN object detection algorithm combined with the Tiny-YOLO-v2 algorithm can accurately detect defects in PCBs with an accuracy of 98.82%.
In the future, the system should be improved for detecting other types of defects. Additionally, more types of defects and more data should be included to achieve group balancing. Other CNN algorithms, such as Rateninet, ResNet, and GoogleNet [17,20,34], should be implemented using GPU hardware for increased learning speed. Finally, the transfer learning approach [35] should be considered for a pretrained YOLO model.

Conflicts of Interest:
The authors declare no conflict of interest.