You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

20 November 2024

Accelerating Die Bond Quality Detection Using Lightweight Architecture DSGβSI-Yolov7-Tiny

,
and
1
Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung 81148, Taiwan
2
Department of Fragrance and Cosmetic Science, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Novel Methods for Object Detection and Segmentation

Abstract

The die bonding process is one of the most critical steps in the front-end semiconductor packaging process, as it significantly affects the yield of the entire IC packaging process. This research aims to find an efficient, intelligent vision detection model to identify whether each chip correctly adheres to the IC substrate; by utilizing the detection model to classify the type of defects occurring in the die bond images, the engineers can analyze the leading causes, enabling timely adjustments to key machine parameters in real-time, improving the yield of the die bond process, and significantly reducing manufacturing cost losses. This study proposes the lightweight Yolov7-tiny model using Depthwise-Separable and Ghost Convolutions and Sigmoid Linear Unit with β parameter (DSGβSI-Yolov7-tiny), which we can apply for real-time and efficient detection and prediction of die bond quality. The model achieves a maximum FPS of 192.3, a precision of 99.1%, and an F1-score of 0.97. Therefore, the performance of the proposed DSGβSI-Yolov7-tiny model outperforms other methods.

1. Introduction

Figure 1 illustrates the complete IC packaging and testing process. AFE stands for Assembly Front End, ABE represents Assembly Back End, FT refers to Final Test, WT is Wafer Test, and PA indicates Pre-assembly. Die bond is one of the steps in the IC packaging and testing process. In this process, Fab/BL (foundry) initially provides the wafer, and then the IC packaging and testing factory tests the wafer to know any defects. Next, in the pre-assembly phase, the wafer is cut to prepare for the subsequent assembly steps. Moving into AFE, the die bond step involves securely attaching the die (bare die) to the IC substrate. The main focus of this paper is to acquire the images of die bonds from the machine and use an intelligent vision detection model to detect whether each chip correctly adheres to the IC substrate. In the wire bond phase, the die is connected to the IC substrate via bonding wires, allowing the electrical signals of a die to transmit to external circuits. Afterward, the process moves into ABE, where molding encapsulates the die with epoxy resin to protect it. Marking involves imprinting identification marks on the package exterior, and plating applies surface treatments such as gold or tin to enhance electrical connectivity and trim/form cuts. It shapes the packaged components into their final form. Finally, the product undergoes testing and packaging. The entire process, including wafer fabrication, testing, packaging, and shipping, constitutes the typical flow of semiconductor production, as shown in Figure 1.
Figure 1. IC packaging and testing process.
Die bonding is the process of cutting a wafer into individual dies and attaching each die to the IC substrate using conductive mediums such as glue or gold balls, with glue being the most commonly used material. This process includes several steps: preparing the substrate, applying adhesive to the package base, positioning the die, applying heat and pressure, cooling and curing, and testing and packaging. Die bonding technology is widely used in electronic packaging, module manufacturing, and optical component manufacturing. It is one of the most critical processes in front-end semiconductor packaging, as it directly affects the quality of the entire IC packaging process. Real-time detection and prediction of die bond yield are essential for adjusting machine production settings, significantly improving the die bond yield [1] and reducing manufacturing costs. This study used images from a renowned semiconductor company in southern Taiwan for data preprocessing to create the training and testing datasets. People manually inspected the collected data and evaluated each of the four sides and four corners of any single image for adhesion quality. Fab then categorized these images into bond_good (including side_good and corner_good), representing properly adhered dies, and bond_bad (including side_bad and corner_bad), representing incorrectly adhered dies. To further determine which side or corner was not fully adhered to, a hexadecimal type system was used, with two bits per group, to specify die bond type.
Therefore, this paper aims to accelerate the calculation of the visual detection and prediction of die bonds to achieve lightweight models and to try new activation functions to improve the accuracy of detection and prediction to ensure the effectiveness of the die bond process. This study refers to the related algorithm improved by Yolov7-tiny [2]. We also explore the improved method of Yolov4-tiny in convolution calculation [3] and the lightweight architecture of Yolov5 for fast wafer contour detection [4]. This study further understands the application case of Yolov7-VD for the intelligent visual detection of vehicles [5] and the practical process of die bond automatic optical inspection (AOI) and identification methods [6]. Therefore, to detect and predict whether the die adherence is complete, this study has proposed Yolov4-tiny, Yolov5n [7], Yolov7, Yolov7-tiny, DSG-Yolov7, DSG-Yolov7-tiny, DSGSI-Yolov7-tiny, and DSGβSI-Yolov7-tiny models for a comparison of execution performance. Finally, the model DSGβSI-Yolov7-tiny, with the best execution performance, was selected, and the best-trained model was exported to TensorFlow Lite and then integrated into the control system of the die bond machine. The most significant contribution of this study is the use of the optimal detection system to determine which type of bond_bad it is. After summarizing the reasons, the engineers can promptly adjust the critical parameters of the machine online to reduce the occurrence of electrical faults or increased resistance. This approach can improve the die bond process yield and significantly reduce manufacturing cost losses.

3. Methods

This chapter’s methodology involves preparing the data, conducting preprocessing, selecting the model, setting parameters, and then training. Next, we can examine the results of these trained models, such as the confusion matrix, PR curve, and loss graph, to determine whether they met the training objectives. After that, we will test the die bond image recognition by classifying the die bond adhesion quality into bond_good and bond_bad categories. Finally, we will evaluate and compare the performance of each model based on their performance metric.

3.1. Data Collection and Preprocessing

The image data from a well-known semiconductor manufacturer in southern Taiwan were collected from a top-down view of the machines, as shown in Figure 10. After annotation, we can generate corresponding XML label files and categorize them into side_good, side_bad, corner_good, and corner_bad. Then, the procedure converts the XML label files into the VOC label format to serve as the input data format for the YOLO-related models. For cross-dataset evaluation, a total of 3145 images were collected, divided into training data (2204 images), validation data (467 images) from machine #1, and testing data (474 images) from machine #2, with a distribution ratio of approximately 70%, 15%, and 15%, respectively.
Figure 10. Sample images of die bond categories. (a,c,e) bond_good; (b,d,f) bond_bad.
This study first trained the models for Yolov4-tiny, Yolov5n, Yolov7, and Yolov7-tiny versions. Before training, the procedure must set hyperparameters. This study set the parameter epoch to 120, batch size to 16, and input image size to 1024 × 1024. The epoch indicates the number of complete iterations over the dataset for training. The batch size refers to the number of samples used to update the model weights after each training iteration. The specification of input image size accelerates object detection and image recognition speed for the die bond. Similarly, this study applies the training process to the DSG-Yolov7, DSG-Yolov7-tiny, and DSGSI-Yolov7-tiny lightweight models.

3.2. Recognition Data

During the testing phase, this study input the test set of 474 images into the trained models for recognition, yielding identification results for each image in the test set. Each image’s recognition results include a category and a confidence level. The categories include side_good, side_bad, corner_good, and corner_bad. The confidence level indicates the model’s certainty in its prediction. Experimental results show that the confidence levels for each part of all images are at least 95% or higher. If the die bond meets the conditions for complete adhesion on all four die sides, the recognition result will indicate side_good; otherwise, it will indicate side_bad. Similarly, if the die bond meets the conditions for complete adhesion at all four die corners, it will display corner_good; otherwise, it will show corner_bad, as illustrated in Figure 11.
Figure 11. Die bond recognition. (a,c,e) bond_good; (b,d,f) bond_bad.

3.3. Judgment of Detection Results and Types of Die Bonding

Figure 12 illustrates the die bond detection results from classification into four output classes: side_good, side_bad, corner_good, and corner_bad. The fab defines judgment criteria. The check considered a die bond as bond_good if the bonding wholly adhered to all four die sides, all four die corners, any three die sides, or any three die corners; otherwise, it is classified as bond_bad, as shown in Figure 12.
Figure 12. Judgment of categories with bond_good or bond_bad.
The die bond image is the input signal for the die bond detection model. In contrast, the output signal consists of binary outputs indicating the bonding results for the chip’s sides and corners, either good or bad [19], as shown in Figure 13. In Figure 13, if the model’s output signal indicates the condition of side_good or corner_good, a “1” can be assigned to the corresponding bit. Conversely, if the condition is side_bad or corner_bad, a “0” can be assigned to the corresponding bit. The model output signal for the four sides of the chip has a length of four bits, which designates the side bond code of the chip’s sides. Similarly, the model output signal for the four corners of the chip also has a length of four bits, which designates the corner bond code of the chip’s corners. In other words, Figure 13 shows the arbitrary combination of the side and corner bond codes, representing the type of die bond—the combinations of the two hexadecimal values, ranging from 00x to FFx.
Figure 13. Any combination of the bond detection of the chip’s sides and corners.
Table 1 displays the various die bond types, where the index “s” represents the bond code for the chip’s sides, the index “c” stands for the bond code for the chip’s corners, the symbol “b” indicates the condition of bond_bad, and the symbol “g” denotes the condition of bond_good. Each index interprets a hexadecimal value ranging from 0x to Fx for any code of the chip’s sides or corners. In Table 1, we can encode the arbitrary combinations of the code of the chip’s sides and corners in hexadecimal from 00x to FFx. Combining two hexadecimal codes from 00x to FFx represents the type of die bond. Statistics on the die bond type can identify the locations where bond defects predominantly occur on the chip’s sides or corners. Consequently, the fab can adjust critical parameters of the machine promptly to improve the occurrence of such defects, thereby increasing the yield of the die bond process and significantly reducing manufacturing cost losses.
Table 1. The type of die bond.
Figure 14 shows the storage of the predicted classification results from the tests in a separate folder. Subsequently, this study will compile the actual classifications and the expected results to create a confusion matrix for calculating precision. Additionally, we can quickly assess the integrity of each chip’s bonding and continuously monitor the machine’s operational status. If any issues arise, the engineer can adjust the machine’s parameters to prevent excessive poor chip bonding.
Figure 14. Prediction classification with type of die bond.

3.4. Model Improvement

Following the DSG-Yolov7 model, this study also focuses on improving the Yolov7-tiny model by ablation study using the trial and error of employing DSGConv to the different modules in the backbone, neck, and head to realize lightweight model enhancement, and this model is referred to as DSG-Yolov7-tiny, as shown in Figure 15. Figure 15 illustrates the approach of using depthwise separable convolution combined with Ghost Convolution to replace traditional convolution, achieving the goal of a lightweight model. Some designs replace specific traditional convolution layers with GhostNet, which can effectively reduce the model’s parameter count and enhance overall execution performance [20]. Although GhostConv can significantly decrease computational load, using traditional convolution in specific critical locations of the model is crucial for maintaining inference precision. Therefore, in the critical nodes of the backbone and prediction sections, employing traditional convolution helps to preserve the stability of the network structure and the object precision of the inference, as shown in Figure 15.
Figure 15. DSG-Yolov7-tiny architecture. Note: k represents kernel size, and s stands for stride.
However, the DSGConv architecture for lightweight models decreases the complexity of inference calculations, which may lead to a slight reduction in precision. Equation (1) computes the Rectified Linear Unit (ReLU) activation function, where x represents the input and r e l u ( x ) stands for the output. In the DSG-Yolov7-tiny architecture, the activation function used is the LeakyRectified Linear Unit (LeakyReLU), an improvement of the ReLU activation function. Equation (2) calculates LeakyReLU, where x represents the input, L r e l u ( x ) stands for the output, and α denotes the slope of the function’s output when the input is negative; it is a small positive number, usually around 0.01. The ReLU function outputs zero for negative input values, causing the neuron to lose its learning ability. LeakyReLU addresses this issue by introducing a slight slope while retaining ReLU’s computational simplicity and sparse activation characteristics.
r e l u ( x ) = x           i f     x 0 0           i f     x < 0
L r e l u ( x ) = x           i f     x 0 α x     i f     x < 0

3.5. Model Enhancement

As mentioned in the previous section, the LeakyReLU activation function outputs with a fixed slope, so it may not optimally adapt to different data distributions or network layers. On the other hand, the introduction of negative outputs by LeakyReLU can, in some instances, affect the overall learning effectiveness of the network, leading to a decline in model performance. Improving the activation function within the architecture can enhance inference precision [21,22]. Therefore, this study modifies the activation function in the DSG-Yolov7-tiny architecture, specifically replacing LeakyReLU with the Sigmoid Linear Unit (SiLU). SiLU is a smooth activation function that helps reduce jitter during training and fosters more stable gradient updates. Following the DSG-Yolov7 model, this study also stresses the enhancing of the Yolov7-tiny model by ablation study using the trial and error of employing SiLU, ModifiedSiLU, and AdaptiveSiLU to the different modules in the backbone, neck, and head to improve the prediction precision of the proposed model, as shown in Figure 15.
The characteristic of this activation function, which lies between linear and nonlinear, allows SiLU to be more flexible in capturing subtle changes in input data compared to other activation functions such as ReLU and LeakyReLU. Additionally, incorporating SiLU improves the overall stability of gradients, and experimental results later confirmed a slight increase in precision and precision.
SiLU, like LeakyReLU, also produces negative outputs for negative inputs. Equation (3) evaluates the Sigmoid function, where x represents the input and σ ( x ) denotes the output. When x is very large, e x approaches zero, making σ ( x ) approach 1. When x is very small (negative), e x becomes very large, and σ x approaches 0. The Sigmoid function maps any real input x to the interval [0, 1]. Equation (4) estimates the SiLU activation function, where x represents the input, σ ( x ) stands for the Sigmoid function, and s i l u ( x ) is the output. Similar to LeakyReLU, SiLU produces negative outputs for negative inputs.
σ ( x ) = 1 / ( 1 + e x )
s i l u ( x ) = x · σ ( x )
In some cases, using SiLU only resulted in slight improvements in precision, leading to the development of two variations related to SiLU: ModifiedSiLU and AdaptiveSiLU. The purpose of adopting these revised versions of SiLU is to enhance inference speed while maintaining high precision. Equation (5) calculates ModifiedSiLU, where β represents a learnable parameter and M o s i l u ( x ) stands for the output. ModifiedSiLU introduces an adjustable β parameter to control the steepness of the curve. When x is large, σ ( β x ) approaches 1, and when x is small, σ ( β x ) approaches 0. The initial β value is 1.0, which limits the flexibility of function adjustment, resulting in a fixed curve slope. Therefore, β is adjusted to optimize the activation effect in different layers, first increasing the value to 1.5 to achieve more robust nonlinear feature extraction. Then, β is gradually reduced to 1.25 for a flatter activation curve, resulting in better gradient flow.
M o s i l u ( x ) = x · σ ( β x )
β a x is a neural network, and Equation (6) computes β a x , where x represents the input. The initial value of the first layer’s bias b 1 is set to 0, and the initial value of the output layer’s bias b 2 is also set to 0. The initial values of the weight matrix W 1 for the first layer and W 2 for the output layer are randomly generated. β a x behaves similarly to a fixed β learning mechanism, providing greater flexibility and adaptability, enabling it to handle input-related scaling. Equation (7) calculates AdaptiveSiLU, where x represents the input, and the initial value of the bias b is −1. When b = −1, the function behaves more like ReLU near x = 0, causing σ β a x · x + b 0 .
β a x = W 2 · r e l u ( W 1 x + b 1 ) + b 2
A p s i l u ( x ) = x · σ ( β a x · x + b )
According to Xavier [23,24], Equation (8) evaluates the initial values of W 1 and W 2 as uniformly distributed random numbers, where W represents the initialized weight matrix, and U ( a , b ) stands for the uniform distribution ranging from a to b. n i indicates the input dimension of the layer, which will be the size of the input data 480 × 480, and n o u t represents the output dimension of the layer, corresponding to the number of classes n o u t = 4. The term 6 is the coefficient used in Xavier initialization to ensure the weights maintain a stable variance during forward propagation. After initializing the weight matrixes W 1 and W 2 , weight updates using backpropagation can change their values. First, the network makes predictions and computes the error (loss). Then, backpropagation calculates the loss gradient to the weights. Finally, the optimizer (such as gradient descent) updates W 1 and W 2 by adjusting them in a direction that reduces the error. The training repeated this process to gradually enhance the performance of the network. Therefore, the input dimension of W 1 is the output dimension of the previous layer ( n i ), and the output dimension is the number of neurons in the hidden layer ( n o u t ). The input dimension of W 2 is the output dimension of the hidden layer, and the output is one because it represents the scaling factor β a x .
W U ( 6 n i + n o u t ,   6 n i + n o u t )
In Figure 16, the DSGβSI-Yolov7-tiny architecture is modified primarily in the backbone section. The network avoids setting all β values to 1.0, which would fix the slope of the curve and reduce the flexibility of function adjustment; it uses ModifiedSiLU in the early layers (0, 1, 5, and 10), with β starting at 1.5 and decreasing to 1.25 after two layers. This approach allows the model to achieve better gradient flow. The later layers maintain the original SiLU to ensure gradient stability, helping the model quickly extract prominent features while balancing feature extraction and information transfer.
Figure 16. DSGβSI-Yolov7-tiny architecture. Note: k represents kernel size, and s stands for stride.
In the neck section, the network employs AdaptiveSiLU to enable the model to handle complex feature combinations in the middle part of the network. Finally, the network utilizes AdaptiveSiLU again in the head (prediction) section, allowing the model to autonomously determine the optimal activation function.

3.6. Build a Model

This study trained eight models, Yolov4-tiny, Yolov5n, Yolov7, Yolov7-tiny, DSG-Yolov7, DSG-Yolov7-tiny, DSGSI-Yolov7-tiny, and DSGβSI-Yolov7-tiny, for predictions. Each model randomly initialized the parameters, and during the training phase, the parameters were gradually adjusted through trial and error, as shown in Table 2. Table 2 presents the optimized parameter settings for each model in this experimental case. Figure 13 illustrates how the models take images as input signals, using detection and prediction to determine the bonding conditions of each die. The output signals indicate the die bond quality and the die bond type, which can display the bonding status of the four die’s sides and four die’s corners.
Table 2. Hyperparameter settings.

3.7. The Workflow of the System

First, the system performs data preprocessing on the collected 3145 images provided by the semiconductor manufacturer. Data preprocessing involved organizing the data, using LabelImg to label each image, and converting the XML files into the VOC file format, which YOLO accepts. Next, the dataset was divided into training, validation, and testing sets, with approximate proportions of 70%, 15%, and 15%, respectively. During the training phase, modeling sets the parameter epoch to 120, batch size to 16, and input image size to 1024 × 1024. The models selected for training include Yolov4-tiny, Yolov5n, Yolov7, and Yolov7-tiny, along with the modified models DSG-Yolov7, DSG-Yolov7-tiny, DSGSI-Yolov7-tiny, and DSGβSI-Yolov7-tiny. After the training, the testing phase involves inference and recognition using the test set. Finally, we examine each model’s training and testing results to evaluate their performance, as shown in Figure 17.
Figure 17. The workflow of the system.

4. Experimental Results and Discussion

4.1. Experimental Environment

Table 3 displays the hardware configuration used in this experiment. Table 4 outlines the software packages utilized in the experiment. Data preprocessing involved the use of LabelImg to label each image. The next step was to use Jupyter Notebook to convert the XML files into text files and divide the dataset. Then, the experiment used Anaconda Prompt, Python, and PyTorch to execute training and recognition. The experiment utilized TensorFlow to monitor the training progress. Finally, based on the recognition results for each image, this study categorized chips with complete bonding as bond_good and those with incomplete bonding as bond_bad using Jupyter Notebook. Furthermore, this study evaluated the performance of multiple models and used the Anaconda Prompt to export the best model in TensorFlow Lite format for use in production line machinery.
Table 3. Hardware specifications.
Table 4. List of packages.

4.2. Data Collection and Model Evaluation

We primarily used Anaconda 3 to train the Yolov5 series, Yolov4-tiny, Yolov7, and Yolov7-tiny on a PC and then deployed them on a Jetson Nano embedded system to compare the results of each model. Next, we also deployed the improved models on the Jetson Nano for a comprehensive performance evaluation of all models. The data source comprised 3145 images from a well-known semiconductor manufacturer in southern Taiwan, where the first dataset collected 2491 training images from machine #1, and the second dataset 474 test images from machine #2. This experiment uses the data arrangement to train and test the model to achieve the effect of cross-dataset evaluation. This study annotated each image using the LabelImg software, followed by data preprocessing. The data preprocess divided the training dataset into 2204 images (approximately 70%) for training and 467 images (approximately 15%) for validation. The experiment tested various object detection models on a workstation platform. During the training and validation process, this study adopted k-fold cross-validation [25], selecting k = 5 to evaluate the Yolo-related modeling comprehensively. Modeling recorded the training time for the same set of training data on the workstation, and using Equation (9), we calculated the total inference time I T i required for various object detection models on 474 test images (approximately 15%). In Equation (9), i represents the i -th object detection model used for image inference, I stands for the total number of object detection models, x denotes the x -th test image, X indicates the total number of test images, and E I T i is the time taken to complete the inference for each test image.
I T i = x = 1 X E I T i ,     w h e r e     i = 1 ,   2 ,   ,   I ,       x = 1 ,   2 , ,   X
The experimental setup defined the test image size as 1024 × 1024, with a batch size set to 16 and the number of iterations set to 120. In Table 5, the first row shows the time required for training different object detection models based on the same parameter settings. The second row calculates the time to infer 474 images within the same test set. The experimental results indicate that the proposed DSGβSI-Yolov7-tiny model has a shorter inference time than the other models.
Table 5. Training and inference times.
Table 6 lists each object detection model’s parameters and FLOPs (Gflops). In Table 6, before applying DSGConv for making the models lightweight, the Yolov4-tiny model had the highest number of parameters, while the Yolov5n model had the fewest. After the DSG modifications, the models DSG-Yolov7, DSG-Yolov7-tiny, DSGSI-Yolov7-tiny, and DSGβSI-Yolov7-tiny all showed a significant reduction in the number of parameters, achieving the goal of being lightweight.
Table 6. Parameters and flop.

4.3. Experimental Results

In Figure 18, the PR curve plots recall on the X-axis and precision on the Y-axis, with each point representing different threshold values leading to varying recall and precision results for the mean Average Precision (mAP) calculation. The mAP result is obtained by summing the AP values calculated from recall and precision across all classes and dividing by the total number of classes. Before applying DSGConv for lightweight Yolo-related models, the Yolov7-tiny achieved the best precision, while Yolov4-tiny had the lowest precision. After the DSG modifications, both DSG-Yolov7 and DSG-Yolov7-tiny showed improved speed, but the omission of some complex calculations led to a slight decrease in precision. However, with further refinement in the DSGSI-Yolov7-tiny model, there was a slight improvement in model precision, and an additional enhancement in the DSGβSI-Yolov7-tiny model resulted in even higher precision.
Figure 18. The precision–recall curve for the object detection model. (a) Yolov4-tiny; (b) Yolov5n; (c) Yolov7; (d) Yolov7-tiny; (e) DSG-Yolov7; (f) DSG-Yolov7-tiny; (g) DSGSI-Yolov7-tiny; (h) DSGβSI-Yolov7-tiny.
After the model training is complete, we can use visualization tools to present the results of the loss plot. Under a batch size setting of 16, the loss plot for the DSGβSI-Yolov7-tiny model after 120 training epochs is shown in Figure 19. During the 120 training epochs, this study produced six plots to display the loss curves of the training precision. In Figure 19, the first, second, and third rows represent the localization loss, confidence loss, and matching loss between predictions and ground truth. The first and second columns also represent the training loss and validation loss. In the second column, the abbreviation “val” indicates validation.
Figure 19. Loss plot of DSGβSI-Yolov7-tiny. (a) Box loss; (b) objectivity loss; (c) classified losses; (d) verify box loss; (e) verify objectivity loss; (f) verify classification losses.

4.4. Performance Evaluation

Equation (10) calculates frames per second (FPS) to show the execution speed of object detection. According to Equation (10), it computes F P S j , the FPS obtained from different object detection models, where I R A I T j represents the time required for each image in real-time video input using the different object detection models, J stands for the total number of object detection models, and j denotes the calculation for the j -th object detection model.
F P S j = 1 I R A I T j ,     w h e r e   j = 1 ,   2 , ,   J
Equation (11) evaluates object detection precision using mean Average Precision (mAP), calculated by finding the average precision for each category and then computing the overall average. Equation (11) estimates the m A P l for different object detection models, where l represents the l object detection model used for calculating m A P l , L stands for the total number of object detection models, C l indicates the number of categories that a specific model needs to identify, k l denotes a specific category of the model, and A P k l refers to the precision for that specific category within the model.
m A P l = k l = 1 C l A P k l C l ,     w h e r e   k l = 1 ,   2 ,   ,   C l ,   l = 1,2 , , L
Next, we evaluate the execution speed and precision of different object detection models. After training various object detection models with the same parameters, we tested them using a set of 474 images and plotted the results of the PR curve. Based on Equations (10) and (11), we calculated the execution speed (FPS) and precision (mAP) from the tests, as shown in Table 7.
Table 7. Performance indexes.
Table 7 compares the performance of different versions of Yolo-related models, including Yolov4-tiny, Yolov5n, Yolov7, and Yolov7-tiny, as well as the lightweight models DSG-Yolov7, DSG-Yolov7-tiny, DSGSI-Yolov7-tiny, and DSGβSI-Yolov7-tiny. The performance evaluation metrics include FPS, precision, recall, F1-score, and accuracy. Table 7 also displays the performance of these models across these metrics. The experimental results indicate that the DSGβSI-Yolov7-tiny model achieved the best FPS, precision, recall, F1-score, and accuracy performance.

4.5. Discussion

First, regarding speed metrics (FPS), the Yolov7-tiny model achieved a fast inference speed of 54.9 FPS. However, the inference speeds of the other four models—DSG-Yolov7, DSG-Yolov7-tiny, DSGSI-Yolov7-tiny, and DSGβSI-Yolov7-tiny—were significantly higher than that of the Yolov7-tiny model, with ratios of 1.12, 3.25, 3.19, and 3.5, respectively. These results indicate that the DSG approach can significantly enhance inference speed. Secondly, regarding the precision metric, the results for the four models mentioned above were almost identical. The above suggests that the DSG approach did not significantly alter the precision levels.
Additionally, regarding the accuracy metric, there were only slight differences among the four models. The above indicates that the DSG method does not negatively impact image recognition accuracy. Thus, the lightweight improvements made through DSGCONV significantly enhance the inference speed of the models while maintaining a consistent level of prediction precision and image recognition accuracy.
The DSGSI-Yolov7-tiny model, which employs the SIReLU activation function, achieved the fastest inference speed, considerably boosting overall performance. However, to ensure that the model maintains precision above 99%, the DSGβSI-Yolov7-tiny model was developed, achieving optimal yield rates and reducing loss costs in the die bond process.
Hsu et al. [10] mention depthwise separable convolution layers, which can increase the model’s computation speed and achieve better prediction results while maintaining a smaller model size. Additionally, Zhang et al. [11] highlight the challenges of deploying convolutional neural networks (CNNs) on embedded devices with limited memory and computational resources. Therefore, combining the smaller neural networks of Yolov5 and GhostNet is more suitable for deployment on FPGAs and other memory-constrained embedded devices. This experiment integrates the Yolo-related models with GhostNet to replace traditional convolution, eliminating redundant blocks in the original architecture or replacing them with improved convolution layers. This approach significantly reduces convolution calculations, leading to a marked increase in computing speed while sustaining high predictive precision.
Lang et al. [13] pointed out that the Yolov7-tiny architecture is streamlined from the original Yolov7’s backbone, neck, and head to achieve a lighter design. This experiment realizes a more lightweight DSG-Yolov7-tiny, making the model more suitable for operation on resource-constrained devices and highlighting the significance of Yolov7-tiny. Both Sun et al. [12] and Lang et al. [13] mentioned that combining Yolov7 or Yolov7-tiny with the Ghost module allows the improved Yolov7-Ghost to maintain the original Yolov7 detection precision while also increasing inference speed, which is even more pronounced in Yolov7-tiny. This experiment implements DSG-Yolov7-tiny following the same approach to significantly enhance execution speed. Yang et al. [14] proposed improving the activation function of Yolov7-tiny to FReLU, explaining its benefits. This experiment introduces the SiLU activation function to enhance the precision of the DSG-Yolov7-tiny model, resulting in DSGSI-Yolov7-tiny. Furthermore, to improve model precision, DSGβSI-Yolov7-tiny was introduced for real-time and efficient application in die bond production machines.
This experiment also has some limitations. A single image containing multiple die bonds may lead to variations in die bond types. During the training phase, the model may not effectively learn the features of each die bond type, affecting inference precision. The best approach would be to cut a single image with multiple die bonds into separate die bond images for independent inference, but this is time-consuming. Furthermore, although the improved DSGSI-Yolov7-tiny model is swift, there is a slight decline in precision and accuracy. Maintaining the same speed as the original Yolov7-tiny model may impact these metrics.
Additionally, the performance of the GPU is a crucial factor influencing precision, accuracy, and speed. Currently, the GPU model used is the NVIDIA GeForce RTX 4070 Ti. Upgrading to a higher model, such as the NVIDIA RTX 4090, which has 24 GB of GDDR6X video memory, would enable it to handle larger deep learning models, making it particularly suitable for high utilization and throughput tasks in rapid die bond detection and prediction.

5. Conclusions

The DSGβSI-Yolov7-tiny model proposed in this study achieves the highest performance metrics for real-time and efficient prediction, making it an optimal solution for die bond detection and prediction applications. Fab can rapidly deploy our method on factory production machines for die bond detection and prediction. By analyzing the model’s judgments on the classification of die bond images, we can identify the most frequently occurring defects and subsequently adjust critical parameters of the machines in real-time. This process enhances the yield of the die bond manufacturing process while significantly reducing manufacturing cost losses. For semiconductor manufacturers, this increases production yield and minimizes production losses.
Future work will extend the proposed approach to other applications, such as chip contour detection, street view analysis, mask-wearing detection, operator attire compliance, and product detection on factory production lines. Moreover, we will continue to seek a better prediction model, for example, the Yolov11n model, to replace or modify the DSGβSI-Yolov7-tiny architecture. This improvement will allow for optimizing or enhancing our proposed model, facilitating the development of more efficient die bond detection and prediction methods.

Author Contributions

B.R.C. and W.-S.C. conceived and designed the experiments; H.-F.T. collected the dataset and proofread the manuscript; and B.R.C. wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

The Ministry of Science and Technology Council fully supports this work in Taiwan, Republic of China, under grant numbers NSTC 113-2622-E-390-003 and NSTC 113-2221-E-390-015.

Data Availability Statement

The Sample Programs used to support the findings of this study can be found at the following link: https://drive.google.com/file/d/1--JXy7tgrUj0fjAphe3dsi5tTjs0Ee6j/view?usp=sharing (accessed on 22 October 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Chen, M.-F.; Chen, C.-W.; Chen, C.-Y.; Hwang, C.-H.; Hwang, L.-Y. An AOI system development for inspecting defects on 6 surfaces of chips. In Proceedings of the 2016 IEEE International Instrumentation and Measurement Technology Conference Proceedings, Taipei, Taiwan, 23–26 May 2016; pp. 1–6. [Google Scholar]
  2. Yang, Y.; Wang, X. An improved YOLOv7-tiny-based lightweight network for the identification of fish species. In Proceedings of the 2023 5th International Conference on Robotics and Computer Vision (ICRCV), Nanjing, China, 15–17 September 2023; pp. 188–192. [Google Scholar]
  3. Chang, B.R.; Tsai, H.-F.; Chang, F.-Y. Applying advanced lightweight architecture DSGSE-Yolov5 to rapid chip contour detection. Electronics 2024, 13, 10. [Google Scholar] [CrossRef]
  4. Chang, B.R.; Tsai, H.-F.; Chang, F.-Y. Boosting the response of object detection and steering angle prediction for self-driving control. Electronics 2023, 12, 4281. [Google Scholar] [CrossRef]
  5. Li, C.; Tan, G.; Wu, C.; Li, M. YOLOv7-VD: An algorithm for vehicle detection in complex environments. In Proceedings of the 2023 4th International Conference on Computer, Big Data and Artificial Intelligence (ICCBD+AI), Guiyang, China, 15–17 December 2023; pp. 743–747. [Google Scholar]
  6. Alam, L.; Kehtarnavaz, N. A survey of detection methods for die attachment and wire bonding defects in integrated circuit manufacturing. IEEE Access 2022, 10, 83826–83840. [Google Scholar] [CrossRef]
  7. Phan, Q.-H.; Nguyen, V.-T.; Lien, C.-H.; Duong, T.-P.; Hou, M.T.-K.; Le, N.-B. Classification of tomato fruit using YOLOv5 and convolutional neural network models. Plants 2023, 12, 790. [Google Scholar] [CrossRef] [PubMed]
  8. Hsiao, S.-F.; Tsai, B.-C. Efficient computation of depthwise separable convolution in MobileNet deep neural network models. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), Penghu, Taiwan, 15–17 September 2021; pp. 1–2. [Google Scholar]
  9. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  10. Wang, T.; Ray, N. Compact Depth-Wise Separable Precise Network for Depth Completion. IEEE Access 2023, 11, 72679–72688. [Google Scholar] [CrossRef]
  11. Zhang, Y.; Cai, W.; Fan, S.; Song, R.; Jin, J. Object detection based on YOLOv5 and GhostNet for orchard pests. Information 2022, 13, 548. [Google Scholar] [CrossRef]
  12. Sun, D.; Zhang, L.; Wang, J.; Liu, X.; Wang, Z.; Hui, Z.; Wang, J. Efficient and accurate detection of herd pigs based on Ghost-YOLOv7-SIoU. Neural Comput. Appl. 2024, 36, 2339–2352. [Google Scholar] [CrossRef]
  13. Lang, C.; Yu, X.; Rong, X. LSDNet: A lightweight ship detection network with improved YOLOv7. J. Real-Time Image Process. 2024, 21, 60. [Google Scholar] [CrossRef]
  14. Wu, Y.; Tang, Y.; Yang, T. An improved nighttime people and vehicle detection algorithm based on YOLOv7. In Proceedings of the 2023 3rd International Conference on Neural Networks, Information and Communication Engineering (NNICE), Guangzhou, China, 24–26 February 2023; pp. 266–270. [Google Scholar]
  15. Shafiee Sarvestani, A.; Zhou, W.; Wang, Z. Perceptual Crack Detection for Rendered 3D Textured Meshes. arXiv 2024, arXiv:2405.06143. [Google Scholar]
  16. Song, X.; Chen, L.; Zhang, L.; Luo, J. Optimal proposal learning for deployable end-to-end pedestrian detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 3250–3260. [Google Scholar]
  17. Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
  18. Madhuri, P.; Akhter, N.; Raj, A.B. Digital implementation of depthwise separable convolution network for AI applications. In Proceedings of the 2023 IEEE Pune Section International Conference (PuneCon), Pune, India, 14–16 December 2023; pp. 1–5. [Google Scholar]
  19. Lee, J.H.; Kong, J.; Munir, A. Arithmetic coding-based 5-bit weight encoding and hardware decoder for CNN inference in edge devices. IEEE Access 2021, 9, 166736–166749. [Google Scholar] [CrossRef]
  20. Mou, C.; Zhu, C.; Liu, T.; Cui, X. A novel efficient wildlife detecting method with lightweight deployment on UAVs based on YOLOv7. IET Image Process. 2024, 18, 1296–1314. [Google Scholar] [CrossRef]
  21. Li, B.; Liu, L.; Wang, S.; Liu, X. Research on object detection algorithm based on improved YOLOv7. In Proceedings of the 9th International Conference on Computer and Communication (ICCC), Chengdu, China, 8–11 December 2023; pp. 1724–1727. [Google Scholar]
  22. Sun, H.-R.; Shi, B.-J.; Zhou, Y.-T.; Chen, J.-H.; Hu, Y.-L. A Smoke Detection Algorithm Based on Improved YOLO v7 Lightweight Model for UAV Optical Sensors. IEEE Sens. J. 2024, 24, 26136–26147. [Google Scholar] [CrossRef]
  23. Sai, T.A.; Lee, H. Weight initialization on neural network for neuro pid controller-case study. In Proceedings of the 2018 International Conference on Information and Communication Technology Robotics (ICT-ROBOT), Busan, Republic of Korea, 6–8 September 2018; pp. 1–4. [Google Scholar]
  24. Wong, K.; Dornberger, R.; Hanne, T. An analysis of weight initialization methods in connection with different activation functions for feedforward neural networks. Evol. Intell. 2024, 17, 2081–2089. [Google Scholar] [CrossRef]
  25. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.