1. Introduction
The automotive industry is one of the cornerstone sectors of modern society, in which safety and reliability are of paramount importance. Particularly, the quality of body components is directly linked to a vehicle’s safety, making its management a crucial factor in ensuring product reliability. Defects in body parts can cause accidents, leading to significant human casualties and economic losses. Consequently, the development of technologies for effectively detecting and identifying defects during the manufacturing process has become an essential task within the automotive industry [
1,
2]. Defects in body components can arise from various causes, and such defects not only degrade product performance but can also lead to safety incidents.
Therefore, defect detection technology plays a crucial role in the manufacturing process by preemptively identifying and eliminating defective products, thereby enhancing the final product’s quality and ensuring safety. This study aims to demonstrate the possibility of generating fake images of parts with various defects within a group of images of defective parts using the generative adversarial network (GAN) algorithm. By supplementing the collection of defect images manually with generative images, which can produce fine defects that are hard to identify through GANs, this approach offers a novel avenue for enhancing defect detection methodologies [
2,
3,
4].
In the field of object detection, the YOLO model, particularly its latest versions, YOLO v7 and v8, has garnered significant attention due to its fast processing speed and high accuracy. These versions offer improvements in accuracy and processing speed compared to those of their predecessors, greatly expanding the applicability of real-time object detection. This progress represents a substantial advancement in the application of object detection technologies across various industrial sectors.
In this study, the aim is to compare and analyze the performance of generative adversarial network (GAN)-based technologies and object detection technologies utilizing YOLO v7 and v8 models for detecting defects in automotive body parts. Thanks to its ability to generate images that closely resemble real ones, a GAN can be used to enhance the diversity of a training dataset by generating images of defective automotive body parts. This can be particularly useful in situations in which the availability of training data is limited [
3,
5].
As a first step, a process utilizing GAN models is developed to accurately detect defects in automotive body parts. In this phase, the images generated by the GAN are used to augment the training data for the defect detection model, ensuring the model can learn various forms and sizes of defects [
6].
In the second phase, a direct comparison and analysis of the performance of YOLO v7 and v8 models are conducted. This comparison focuses on accuracy, processing speed, and practical applicability. Given the nature of detecting defects in automotive body parts, a high accuracy and rapid processing speed are crucial. Through this comparison, a superior model for detecting defects in automotive body parts can be identified, and optimization strategies for a defect detection system based on the chosen model can be explored.
This study aims to significantly enhance the accuracy and efficiency of manufacturing quality control by effectively detecting and classifying various defects that may occur in the automotive manufacturing process. Given the automotive industry’s demand for high levels of precision and safety, advancements in defect detection technology directly lead to production cost reductions, product quality improvements, and an increase in consumer satisfaction.
Artificial-intelligence-based image processing technologies, particularly defect detection techniques utilizing GAN and YOLO models, play a crucial role in meeting these requirements. These technologies are capable of processing a vast amount of image data at a high speed and accurately identifying defects of various shapes and sizes. This enables the real-time detection of defects during the manufacturing process and immediate corrective actions to be taken, thereby improving overall manufacturing efficiency. The findings of this study are not limited to the automotive industry; AI-based image processing technologies can be applied to quality control in various other industries, including medical, aviation, and electronics manufacturing.
Customized defect detection systems can be developed by learning the specific requirements and types of defects that occur in each industry sector. This will significantly contribute to maintaining consistent product quality and minimizing losses that may occur during the production process.
4. Experiments
This study aims to identify defects in vehicle body parts by capturing photographs of actual defective parts and creating a dataset based on these images to train a deep learning model. By directly capturing and labeling photographs of defective body parts, the model is enabled to accurately identify various defects that may occur in real-world scenarios.
A total of 49 photographs were utilized to train the model, with 29 designated for training, 10 for testing, and 10 for validation purposes. This division follows the common practice of splitting the dataset into training, testing, and validation sets in a 6:2:2 ratio, which aids in accurately evaluating the model’s performance in real-world conditions and preventing overfitting.
Additionally, to facilitate the training of the model for defect identification in vehicle body parts, videos of both good and defective parts were prepared. The recorded videos, each less than five minutes in length, focus on demonstrating how vehicle body parts move and develop defects in real working environments. This approach enables the model to develop the capability to identify defects not only in static images but also in dynamic environments. The types of defects prepared for the model training include nut defects and tears. Nut defects encompass missing or damaged nuts used to secure vehicle body parts, while tears include rips or cracks in body parts.
Figure 4 is presented to compare and analyze the appearance of part 1 and part 2. In (a) and (b) shown in
Figure 4, the defects are not marked separately. In
Figure 5 and
Figure 6, the defective parts of the car body are marked. What is noteworthy in this study is the experimental approach using YOLO and GANs. In the YOLO experiment, the model was used to label nut defect images, and then a performance comparison and analysis between the YOLO v7 and YOLO v8 algorithms were conducted. This allowed us to evaluate how accurately each version of the YOLO algorithm can detect and classify nut defects.
For part 1 and part 2, 1999 images were generated through the GAN for 20,000 epochs, and among them, 12 well-learned images were selected and labeled. A total of 24 GAN images were labeled. In this experiment, two types of loss functions were employed: BCELoss and CrossEntropyLoss. BCELoss is suitable for binary classification problems, whereas CrossEntropyLoss is used for multi-class classification issues. The labeling of both the real and synthetic images generated by the GAN was conducted using the Labellmg program.
Subsequently, by training the YOLO algorithm on these images, an artificial intelligence model capable of distinguishing between real and synthetic images was developed. The effectiveness and accuracy of the developed model were assessed through experiments using video footage of defective and non-defective parts. This evaluation aimed to determine the model’s ability to accurately identify and classify various types of defects that could occur in actual manufacturing and inspection processes.
The artificial intelligence model developed in this study was trained using the YOLO algorithm, and the training results were distributed in the .pt file format, which was integrated into a GUI program. This GUI program is designed to enable users to easily utilize the AI model for identifying defective parts. To accommodate cases in which the trained model may not perfectly identify all types of defects, a specific logic has been implemented to allow the software to make determinations of defects.
While acknowledging the limitations of its object detection capabilities, the design of this system was crucial to incorporate various conditional variables to ensure its effectiveness. For instance, it includes logic that determines whether a part should be classified as good (or normal) or defective based on a certain threshold of object detection failures. This design allows the system to adapt to a wide range of scenarios that may arise in actual manufacturing environments.
This design allows the system to adapt to a wide range of scenarios that may arise in actual manufacturing environments. Various scenarios refer to problems that may occur in a factory, such as detecting defects or controlling defects in electrical systems.
For instance, the methodology includes a comparative operation that determines whether parts should be classified as acceptable (or normal) or defective based on a specific threshold of object detection failures. Specifically, if the number of object detections falls within a predefined limit and does not meet the criteria for being considered a fault detection, the part is deemed normal.
In detail, the system was tested on photographs and videos of defective parts taken under various lighting conditions and backgrounds. The computer used for the experiments was equipped with a high-performance GPU, enabling complex image processing and rapid defect detection. The software environment utilized the Python programming language and the PyTorch framework, while the GUI was developed using PyQt to provide a user-friendly interface.
This experimental environment is designed to maximize the performance of the artificial intelligence model, and it is expected to be highly effective when applied to defect detection processes in manufacturing industries. Through the GUI, users can easily identify defective parts, and the implementation of defect judgment logic will enhance quality control in manufacturing processes.
4.1. Experimental Environment
The experimental setup for this study is detailed in
Table 1, utilizing PyTorch 2.12 for the training and development of the artificial intelligence model. PyTorch, a widely used open-source machine learning library, offers excellent performance in developing complex AI models and computations. Furthermore, Cuda 12.1 was used for GPU acceleration. Additionally, the environment was configured as follows to facilitate the training of the GANs and YOLO.
4.2. Results
To assess the progression of the GAN training, we conducted visual inspections instead of using traditional metrics, such as accuracy. This approach was chosen because it is challenging to evaluate the performance of generative models solely through objective metrics. By visually inspecting the generated images, we could directly assess how similar the generated images are to real images. Visual inspections can be conducted by an average person.
Additionally,
Table 2 summarizes the metrics representing the competitive relationship between the generator (G) and the discriminator (D) throughout the training process. As training progresses, the generator increasingly produces images that are difficult to distinguish from real ones, prompting the discriminator to continuously learn to more accurately identify these images. The loss values of G and D exhibit constant fluctuations during this process, indicating an ongoing advancement of both networks through their competition.
In this study, experiments were conducted on the datasets for part 1 and part 2 using the basic structure of a GAN, known as a simple GAN, which consists of two parts: a generator and a discriminator. These two networks compete and learn from each other, with the generator attempting to create fake images that resemble real images, and the discriminator striving to distinguish between real and fake images.
Figure 5 and
Figure 6 visually demonstrate the qualitative changes in the images generated from the part 1 and part 2 datasets through this learning process.
The visual inspection results indicate that, over time, the generated images reach a qualitative level that makes them difficult to distinguish from real images. This suggests that despite its basic structure, the simple GAN, with sufficient training and an appropriate dataset, is capable of generating highly realistic images. However, enhancements in pixel or resolution quality might require the implementation of more advanced GAN models or modifications in the neural network architecture.
Subsequently, the generated fake images were labeled alongside the real images.
Figure 7 demonstrates the labeling process conducted using the Labellmg program, formatted to comply with the YOLO standards. Training was carried out with both real and fake images using YOLO v7 and v8.
This paper introduces a novel training methodology that integrates generative adversarial networks (GANs) with the YOLO v7 and YOLO v8 object detection frameworks to enhance the system’s detection accuracy. The proposed approach is divided into six distinct stages, each contributing to the overall efficacy of the object detection system.
In the initial stage, real images are fed into a GAN for training. This process allows the GAN to learn the distribution of the input data, enabling it to generate new images that are visually similar to the original dataset.
Upon completion of the GAN training, the second stage involves the generation of synthetic images. These images are produced at various epochs, allowing for the selection of the most realistic outputs for further processing.
The third stage focuses on the selection and enlargement of certain synthetic images. This step is crucial for ensuring that the generated images are of sufficient quality and resolution for object detection tasks.
In the fourth stage, the real images, along with the selected synthetic images, undergo a labeling process using the Labellmg tool. This manual annotation step is essential for creating accurate ground truth data for the training of object detection models.
The fifth stage entails the training of either the YOLO v7 or YOLO v8 framework using the labeled images and corresponding annotation files. Both YOLO versions are renowned for their efficiency and accuracy in detecting objects across various domains.
Finally, the sixth stage involves the evaluation of object detection performance.
The resolution of the image used for labeling was 640 × 480 or higher for the actual image and 240 × 240 for the fake image. The resolution of the fake image can increase depending on the memory of the GPU used.
Due to the limited number of images in the automotive body defect dataset, there were challenges in obtaining significant graphs.
In this study, training, testing, and validation were conducted under the same conditions using YOLO v7 and v8. The results indicated that while YOLO v7 failed to detect objects, YOLO v8 successfully recognized them.
Figure 8 illustrates a scenario in which YOLO v7 was unable to identify objects.
In this study, the performance measurement results for defect_nut1 and defect_nut2 using YOLO v8 are summarized in
Table 3.
Although the performance was diminished due to the scarcity of image training data, it was observed that the system could partially execute the detection function for the defective parts.
One of the most widely used performance evaluation metrics in object detection and instance segmentation tasks is the mAP50 (the mean average precision at IoU = 0.50). This metric is utilized to assess how well a model makes predictions. This metric measures the mean average precision (mAP) across various intersection over union (IoU) threshold values. The IoU quantifies the overlap between predicted bounding boxes and actual bounding boxes, serving as a critical factor in assessing accuracy in object detection. Specifically, the mAP50-95 calculates the average precision for each IoU threshold, with increments from 0.50 to 0.95 in steps of 0.05, and then takes the average of these values. This is utilized to assess how precisely a model can detect objects across a range of precision levels.
Figure 9 illustrates the results when trained with fewer than 50 images using YOLO v8. It can be observed that the performance of the mAP50 and mAP50-95 graphs is suboptimal. In the present study, enhancements to the performance depicted in Graph 9 can be achieved by increasing the quantity of training images and conducting additional tasks such as labeling.
In the implementation described earlier,
Figure 2 outlines the design of a GUI that incorporates video processing technology with a PyTorch-based YOLO v8 learning model for defect detection.
Figure 10 is not directly related to defect detection but shows an experiment to evaluate object detection performance assuming that a sufficient dataset is secured for YOLO v8.
Figure 11 presents the results of an experiment using YOLO v8, conducted with a dataset of 2550 images provided by Roboflow. It demonstrates the variation in confidence values for the static images identified as ‘crush’ defects, which range from 0.3 to 0.9. This variability underscores the model’s capacity to discern defects with varying degrees of confidence across a broad spectrum of test images [
12].
Figure 12 is the result of object detection in a defective product video using YOLO v8. Error detection was captured in the order of (a) to (d).
Figure 12 illustrates the results of using YOLO v8 for object detection in videos of defective products. The sequence from (a) to (d) captures the process of error detection. Initially, malfunctions occurred due to insufficient training data. However, as the model was further trained, it progressively demonstrated an improved ability to detect defective areas.
Two nut holes were accurately detected, with the classification of “defect_nut2” showing confidence levels of 0.69% and 0.85% in
Figure 12a. When the video closely matched the training images, “defect_nut2” exhibited confidence levels of 0.73% and 0.84% in case (b). In instances (c) and (d), the confidence levels were, respectively, 0.73% and 0.84% and 0.81% and 0.9%. One method to enhance confidence levels involves increasing the quantity of training images.
There exists an ensemble technique, which involves combining multiple models to perform predictions. This approach can enhance overall performance by leveraging the strengths of different models. The phenomenon of defect_nut2 being excessively generated, marked as number 1 in
Figure 12, can be considered a case of underfitting. To address this issue, it is necessary to increase the number of images and, concurrently, extend the labeling work to match the increased number of images for improvement.
In
Figure 12, the instances marked as (b), (c), and (d) were considered defective due to the detection of two holes.
Figure 13 presents the results from using YOLO v8 for object detection in videos of non-defective products. While some occurrences of defect_nut2 were detected, they did not exceed a predetermined count. By comparing the quantity of occurrences in the data structure when defect_nut2 is in a flaw-free state, it was verified whether the nuts were correctly attached [
13].
4.3. Future Research
This study particularly utilizes a GAN to generate images of vehicle body parts with realistic defects. Utilizing the generated data, the object detection performances of the latest versions of the YOLO algorithm, namely YOLO v7 and YOLO v8, are compared and analyzed. The first aspect of this research focuses on the use of GANs for data augmentation. Obtaining images of actual defective vehicle body parts is challenging, and datasets are often small in size. By leveraging a GAN to create high-quality synthetic images of defective parts, the aim is to expand the training dataset and enhance the generalization ability of the model.
Secondly, the augmented image data obtained allow for a performance comparison between YOLO v7 and v8. This comparison is conducted across various aspects including accuracy, detection speed, and real-time processing capabilities.
Thirdly, the system’s feasibility in real-world applications is evaluated. Beyond just a performance assessment under laboratory conditions, its potential for deployment in actual manufacturing settings is examined. This necessitates an experimental design that takes into account the complexity and diversity of real-world environments.
Fourthly, system integration and optimization are achievable. Considering factors such as system stability, scalability, and user-friendliness, solutions that can be easily implemented into actual manufacturing processes are developed. This research aims to enhance the accuracy and efficiency of defect detection in vehicle body parts, ultimately contributing to improved quality control in manufacturing and reducing the costs and time associated with defects.