Automotive Scratch Detection: A Lightweight Convolutional Network Approach Augmented by Generative Adversarial Learning

Qu, Guojie; Liao, Jiaying; Liu, Kai; Xu, Bin; Qian, Yuwen

doi:10.3390/machines13121107

Open AccessArticle

Automotive Scratch Detection: A Lightweight Convolutional Network Approach Augmented by Generative Adversarial Learning

by

Guojie Qu

^1,2,

Jiaying Liao

³,

Kai Liu

¹

,

Bin Xu

⁴ and

Yuwen Qian

^3,*

¹

College of Electrical Engineering, Sichuan University, Chengdu 610065, China

²

FAW-Volkswagen Automotive Co., Ltd., Changchun 130011, China

³

School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

⁴

College of Mechanical Engineering, Sichuan University, Chengdu 610065, China

^*

Author to whom correspondence should be addressed.

Machines 2025, 13(12), 1107; https://doi.org/10.3390/machines13121107

Submission received: 23 October 2025 / Revised: 13 November 2025 / Accepted: 21 November 2025 / Published: 29 November 2025

(This article belongs to the Section Machines Testing and Maintenance)

Download

Browse Figures

Versions Notes

Abstract

The growing demand for high-precision machining and inspection in modern manufacturing has positioned machine vision as a key technology for surface defect detection. However, identifying subtle surface scratches on automotive components remains a challenging task due to the stringent requirements on sensitivity, precision, and robustness against complex background interference. In this paper, we propose an automated detection system with a Convolutional Neural Network (CNN) architecture. To address data scarcity, we construct a large-scale, high-quality dataset using both data augmentation and Generative Adversarial Network (GAN)-based synthesis. Furthermore, the proposed lightweight CNN replaces traditional fully connected layers with one-dimensional convolutional layers to reduce parameter complexity and model size, while a Dropout mechanism is incorporated to mitigate overfitting and enhance generalization. Experimental results demonstrate that the proposed model achieves superior detection accuracy and robustness across diverse imaging conditions. Moreover, the developed system effectively addresses the limitations of data insufficiency and model complexity, offering an efficient and automated solution for surface quality inspection in industrial manufacturing.

Keywords:

scratch detection; industrial surface inspection; lightweight convolutional neural network; generative adversarial learning

1. Introduction

In modern industrial manufacturing, high-precision machining and its associated inspection technologies form the foundation of product quality assurance. With the continuous advancement of manufacturing processes, the increasing demand for superior surface quality, and the diversification of product lines, inspection systems are required to achieve simultaneous improvements in sensitivity, accuracy, and robustness against complex background interference [1]. To meet these stringent requirements, machine vision has emerged as a transformative technology for surface defect detection, offering rapid, non-contact, and repeatable inspection capabilities [2].

In the early development of automated inspection systems, traditional image processing techniques were predominantly employed for surface scratch detection. These methodologies encompass filter-based noise reduction techniques, such as Gaussian and median filtering, which effectively diminish image noise, followed by the edge detection operators, such as Sobel [3] and Canny [4], to delineate scratch contours. Moreover, morphological transformations, including erosion and dilation, were further implemented to refine defect regions. These methodologies aimed to extract scratch characteristics through predefined image transformations and analytical sequences. However, these methodologies were inherently constrained by their dependence on manually designed feature extractors and parameter tuning. In addition, feature-based approaches, such as grayscale statistical methods [5], transform-domain techniques [6], and spatial mapping strategies [7], demonstrated effectiveness in specific scenarios while exhibiting inherent limitations. Their dependence on a priori knowledge restricts adaptability to varying lighting conditions, material textures, and defect morphologies, thereby limiting their applicability in increasingly complex industrial inspection environments.

Distinct from conventional manual visual inspection methods that rely on human observation, machine vision provides an objective and intelligent framework for automated inspection through the synergistic integration of optical imaging, image processing, and intelligent algorithms [2]. By employing a non-contact measurement strategy, machine vision effectively eliminates the risk of secondary damage to precision components while overcoming the inherent subjectivity and inconsistency of human perception. Consequently, this technology enables stable, repeatable, and high-fidelity defect assessment and has been widely recognized as a promising solution for achieving efficient and accurate surface defect detection in industrial applications.

Building upon prior research on surface scratch detection, deep learning has achieved remarkable breakthroughs in the broader fields of object detection and image segmentation, providing a rich foundation of methodologies and architectures that can be leveraged to enhance defect detection performance [8,9]. In Object Detection, seminal contributions such as Regions with CNN features (R-CNN), its optimized successors—Fast R-CNN [10]—and Faster R-CNN [11] have pioneered the use of Region Proposal Networks (RPNs) for end-to-end target localization. Meanwhile, single-stage detectors, including the You Only Look Once (YOLO) [12] and Single Shot MultiBox Detector (SSD) [13], have further advanced the real-time detection paradigm by directly regressing object positions and categories. These developments conceptualize objects as spatially localizable targets, a principle that naturally extends to scratch detection tasks, where the objective is to identify and localize fine surface defects with high spatial precision. Following this paradigm, He et al. [14] employed a baseline CNN to generate feature maps and synthesized features through a multistage feature fusion network to achieve detailed positioning of steel plate defects. Similarly, Zhang et al. [15] proposed a novel deep convolutional neural network architecture for surface defect detection, incorporating a dense cross-stage partial (CSP) Darknet backbone and an efficient channel attention mechanism. Moreover, Jiang et al. [16] propose a joint attention-guided feature fusion network to adaptively fuse low-level and high-level features. In addition, Li et al. [17] develop a lightweight convolutional neural network to realize automatic scratch detection for components in contact sliding, such as those in metal forming. Although these object detection-based approaches can effectively extract scratch regions from complex backgrounds, they inherently lack pixel-level segmentation capability, which is crucial for quantitative surface analysis and accurate defect measurement.

In the domain of semantic segmentation, the objective is to assign a categorical label to each pixel within an image, thereby enabling precise delineation of object contours and facilitating fine-grained structural analysis [18,19]. This pixel-level inference capability makes semantic segmentation particularly suitable for detecting subtle surface features such as the shape, length, and width of scratches. The introduction of the Fully Convolutional Network (FCN) represented a seminal advancement, as it replaced the fully connected layers in conventional CNNs with convolutional operations, allowing inputs of arbitrary dimensions while producing dense pixel-wise prediction maps [20]. Subsequent developments, exemplified by the DeepLab series, incorporated atrous convolutions to enlarge receptive fields without increasing model complexity and employed Conditional Random Fields (CRFs) to refine segmentation boundaries [21]. Moreover, the integration of CRFs into Recurrent Neural Network (RNN) frameworks has facilitated end-to-end optimization pipelines for enhanced boundary precision [22].

Building upon the above foundational advancements, numerous studies have explored domain-specific adaptations of semantic segmentation for surface defect detection. Li et al. adopted an improved FCN architecture based on DenseNet121 for crack identification [23]. Additionally, Shi et al. proposed a semi-supervised semantic segmentation method using perturbation invariance and cross-pseudo-supervision to improve real-time industrial defect detection [24]. In contrast, Beyene et al. proposed an unsupervised domain adaptation method with DAFormer to improve surface crack segmentation and generalization to new datasets [25].

Despite their promising results, these supervised segmentation approaches rely heavily on labor-intensive, pixel-wise ground-truth annotations. Nevertheless, the progressive evolution of semantic segmentation frameworks provides a robust algorithmic foundation and architectural inspiration for developing more accurate, scalable, and domain-adapted scratch detection systems.

Despite the remarkable advances of deep learning in image recognition and defect detection, its deployment in practical industrial surface scratch detection remains fraught with significant challenges [26,27]. One of the foremost limitations is the scarcity of high-quality, large-scale, and precisely annotated datasets [28,29,30]. As supervised deep learning models fundamentally depend on abundant and accurately labeled samples, the absence of standardized datasets severely restricts both performance benchmarking and generalization capability. Furthermore, the intrinsic variability of scratch morphology—manifested through variations in shape, size, depth, color, and contrast—combined with contextual diversity across different materials, surface curvatures, and textures, renders reliable defect discrimination exceedingly difficult [31,32,33]. Designing models capable of robustly adapting to such heterogeneity while effectively distinguishing genuine defects from background interference continues to be a formidable research problem. Beyond algorithmic accuracy, practical deployment further necessitates addressing critical engineering concerns, including model compression, hardware acceleration, and system-level integration, to ensure real-time responsiveness, operational stability, and ease of industrial implementation [34]. These challenges collectively underscore that, although deep learning offers substantial potential, its application to industrial scratch detection demands continued research and innovation.

To address the aforementioned issues, in this paper, we propose a lightweight convolutional neural network (CNN) architecture designed to capture the fine-grained characteristics of scratches in industrial applications, such as automotive manufacturing. To alleviate the scarcity of annotated training data, a systematic data augmentation strategy is adopted in conjunction with generative modeling techniques to enhance both the scale and diversity of the dataset. To achieve high detection accuracy for minute and low-contrast scratches, we proposed a lightweight architecture to maintain computational efficiency and robustness, ensuring suitability for industrial deployment. By constructing high-quality datasets, introducing innovative training methodologies, and optimizing network design, we provide a practical solution for automated quality control, reducing reliance on manual inspection, lowering operational costs, and enhancing overall production efficiency. The main contributions of this paper can be summarized as follows:

We construct a novel, high-quality surface scratch dataset for automotive components, covering diverse real-world industrial scenarios. This dataset provides a solid benchmark for training and evaluating scratch detection models, supporting both current applications and future research in automotive defect detection.
To alleviate the critical bottleneck of scarce annotated samples in surface scratch detection, we propose a comprehensive data augmentation framework leveraging a Generative Adversarial Network (GAN) for synthesizing high-fidelity defect images, thereby significantly enhancing the diversity and size of the training dataset.
We propose a lightweight CNN for surface scratch detection, replacing traditional parameter-heavy FC layers with 1D convolutions, significantly reducing model size while maintaining high accuracy for real-time industrial deployment.
Extensive experimental evaluations conducted on practical industrial datasets validate the superior accuracy, robustness, and generalization capability of the proposed data-driven strategies and network architecture compared with existing approaches.

The rest of this article is organized as follows. In Section 2, we describe the dataset construction process and the GAN-based data enhancement strategy developed to address the issue of limited annotated samples. Section 3 presents the proposed lightweight convolutional neural network architecture. In Section 4, we conduct experiments to evaluate the performance of the proposed method. Section 5 highlights the advantages of the proposed method and discusses its potential applications in industrial quality inspection.

2. Dataset Construction and Preprocessing

2.1. Problem Statement and Dataset Establishment

The detection of surface scratches on automotive components presents substantial challenges due to pronounced variations in their morphology, size, and color, compounded by interference from complex background textures. Traditional image processing techniques are generally inadequate for addressing such variability, highlighting the necessity for robust deep learning frameworks capable of ensuring both high detection accuracy and strong generalization across diverse surface conditions. To this end, we develop a lightweight CNN to accurately identify and localize surface scratches while maintaining computational efficiency, thereby meeting the stringent requirements of industrial quality inspection.

To facilitate the training and evaluation of the proposed model, a dedicated dataset is constructed and publicly released to ensure reproducibility. The dataset comprises a total of 5000 high-resolution images, including 2000 real images and 3000 synthetic images generated by a GAN. These images cover diverse automotive components, such as body surfaces, engine blocks, and gear surfaces, and include both scratch-free samples and specimens with scratches of varying lengths, widths, and orientations, ensuring data diversity and representative coverage of real manufacturing scenarios. The dataset is available at https://huggingface.co/datasets/Ying-II/ASDE (accessed on 22 October 2025). Detailed descriptions of preprocessing procedures and data augmentation strategies are provided in the subsequent sections to illustrate how data quality, scalability, and robustness are systematically enhanced.

2.2. Data Acquisition and Processing

2.2.1. Data Acquisition

To ensure effective training of the neural network, the availability of a comprehensive and diverse dataset is indispensable. In this work, the dataset contains both scratch and non-scratch images, covering various scenarios such as body surface scratches, engine block scratches, and gear surface scratches as shown in Figure 1. Since deep learning is inherently data-driven, the robustness of scratch recognition largely depends on the scale and heterogeneity of the dataset.

To enhance diversity, the collected images undergo data augmentation using rotation, scaling, flipping, and color jittering. These operations expand the dataset and introduce additional variability, which improves the generalization capability of the model and reduces the risk of overfitting.

From the 2000 high-resolution real images, 1000 images are selected to train a generative adversarial network (GAN), which generates an additional 3000 synthetic images replicating realistic scratch patterns observed in manufacturing scenarios. The synthetic images are then combined with the original 1000 real training images to form a mixed dataset. This dataset is randomly shuffled to mitigate ordering bias and to ensure a uniform distribution of scratch characteristics across subsets. Subsequently, it is partitioned into training and validation subsets with a 4:1 ratio. The training subset is used to optimize model parameters by enabling the network to learn discriminative representations of scratch-related features, whereas the validation subset serves as an independent benchmark for evaluating the model’s generalization capability.

The remaining 1000 real images, which are excluded from GAN training, are reserved as a separate test set to assess the model’s performance on previously unseen real data, providing an unbiased evaluation of detection accuracy. Standard performance metrics, including accuracy, precision, and recall, are computed on the validation set to quantitatively evaluate detection performance. These metrics further guide hyperparameter optimization—such as tuning the learning rate, batch size, and number of epochs—thereby facilitating stable convergence and optimal model performance.

2.2.2. Image Pre-Processing

Before applying deep learning for image recognition or target detection, image preprocessing is typically required. Neural networks operate based on the statistical distribution of input samples, and constraining data within a certain range facilitates subsequent computations and accelerates the convergence of the training process.

Initialization of network parameters generally follows a normal distribution with zero mean or a uniform distribution within a small range. Without preprocessing, features in the input may exhibit large differences in scale, which can cause the input to activation functions to become excessively large or small. For activation functions, such as sigmoidor tanh, the region of maximum derivative lies near zero. When the input is far from this range, the function output saturates and the derivative approaches zero. In gradient descent-based optimization, this may lead to vanishing gradients and slow convergence. Therefore, activation functions like sigmoid are primarily effective within a limited interval (approximately −2 to 2).

To ensure inputs fall within this valid range, the data are scaled to zero mean and unit variance through image normalization. Image normalization centers the data by removing the mean, aligns it with the underlying data distribution, and mitigates the effects of uneven illumination in the images, thereby improving model generalization.

2.3. Data Augmentation and Generation

2.3.1. Data Augmentation

In the task of detecting scratches on automotive components, an inadequate sample size poses a significant challenge. The morphology and spatial distribution of scratches exhibit high variability, and a limited dataset fails to encompass the full spectrum of potential scratch characteristics. Consequently, this deficiency results in erroneous classifications when the model encounters novel scratch patterns, thereby compromising its performance and generalization capability.

To address the issue of insufficient samples, data augmentation techniques are employed to increase the volume and diversity of image data by applying a combination of transformations, ensuring that the model does not encounter identical images during training. These transformations include scaling, rotation, flipping, translation, shearing, noise perturbation, and contrast adjustment.

By employing augmentation methods, the dataset can be effectively expanded, providing richer training material for the neural network. This process enables the model to capture target features better, mitigates overfitting, and reduces generalization error.

2.3.2. Data Generation

Although conventional data augmentation methods can increase the number of training samples, the generated images are merely simple transformations of the original data and fail to capture the full variability of scratch patterns on automotive components. To further enhance the generalization capability of the model, Generative Adversarial Networks (GANs) [35,36] are employed to synthesize additional realistic samples. GANs are generative models that learn the underlying distribution of a limited set of real images and produce high-quality synthetic samples, thereby enriching the dataset [37]. By incorporating these synthetic images, the diversity of scratch features is increased, enabling the model to better recognize novel and complex scratch patterns and to mitigate the limitations imposed by the small sample size, as illustrated in Figure 2. Therefore, this approach can improve the robustness and accuracy of scratch detection on automotive parts.

A GAN consists of two competing networks: a generator G and a discriminator D. The generator aims to transform random noise z into images

G (z)

that resemble real automotive component images, while the discriminator attempts to distinguish between real images x and generated images. The adversarial training process is formulated as:

min_{G} max_{D} V (D, G) = E_{x \sim p_{data} (x)} [\log D (x)] + E_{z \sim p_{z} (z)} [\log (1 - D (G (z)))]

(1)

where

p_{data} (x)

denotes the distribution of real images, and

p_{z} (z)

represents the distribution of the input noise. The discriminator maximizes the probability of correctly classifying real and synthetic images, while the generator minimizes the probability that its outputs are identified as fake.

During training, the generator and discriminator are updated in an alternating fashion. Specifically, the discriminator is first optimized to maximize the objective function

V (D, G)

:

D^{*} = \arg \max_{D} E_{x \sim p_{data} (x)} [\log D (x)] + E_{z \sim p_{z} (z)} [\log (1 - D (G (z)))]

(2)

Subsequently, the generator is optimized to minimize the same objective function:

G^{*} = \arg \min_{G} E_{z \sim p_{z} (z)} [\log (1 - D (G (z)))]

(3)

This alternating optimization continues until the generator produces images that are indistinguishable from real automotive component images, thereby effectively learning the distribution of real data. The generated images are then used to augment the training dataset, providing high-quality samples that improve the model’s ability to detect scratches with diverse shapes, sizes, and positions on automotive parts.

Among numerous GAN variants, we select FastGAN [38] to generate scratch images of automotive components due to its efficient training speed and superior generation quality. FastGAN employs an optimized architecture with a lightweight generator and discriminator design and incorporates an attention mechanism that enhances the network’s ability to learn critical image features effectively. Additionally, FastGAN achieves rapid convergence during training, producing high-quality images within a shorter timeframe. Notably, FastGAN generates images with exceptional detail and diversity, effectively replicating the varied morphologies of scratches on automotive parts. To quantitatively and qualitatively validate the realism of the generated images, we conduct a user study involving five human evaluators, each assessing 200 randomly selected GAN-generated samples. 95.4% of the images are judged to be realistic and consistent with genuine scratch patterns, confirming the high visual fidelity and applicability of the synthetic data for training the scratch recognition model.

3. Scratch Recognition with CNN

3.1. System Model

The Automotive Scratch Recognition and Detection System represents a sophisticated deep learning-based computer vision framework, comprising an image acquisition module, a feature extraction module, a classification decision module, and a results output module. The framework is shown in Figure 3.The system operates on an image set defined as

X = {x_{1}, x_{2}, \dots, x_{i}, \dots, x_{I}}, x_{i} \in R^{H \times W \times C},

(4)

where

x_{i}

denotes the i-th input image with spatial resolution

H \times W

and channel dimension C, and I represents the total number of images. The corresponding label set is

Y = {y_{1}, y_{2}, \dots, y_{i}, \dots, y_{I}}, y_{i} \in {0, 1},

(5)

where

y_{i} = 0

indicates that the image has no scratch, and

y_{i} = 1

indicates the presence of a scratch. Each image

x_{i}

undergoes preprocessing operations such as greyscaling and size normalization, and is subsequently fed into a convolutional neural network (CNN) for feature learning and classification. The CNN functions as a classifier defined by

{\hat{y}}_{i} = Classify (x_{i}; θ), {\hat{y}}_{i} \in {0, 1},

(6)

where

θ

denotes the trainable parameters of the network and

{\hat{y}}_{i}

represents the predicted label of image

x_{i}

. The model employs multi-layer nonlinear transformations to extract discriminative features of scratches, ultimately producing binary classification results distinguishing scratched from non-scratched images. Throughout this process, the raw image data is represented exclusively through the network’s parameterized feature representations, thereby eliminating the need for manual feature engineering and achieving a fully end-to-end automated detection workflow.

Deep learning, a subset of machine learning founded on neural networks, constructs multiple layers of artificial neurons to establish intricate mapping relationships between input and output data. Emulating the mechanisms of the human brain in its pursuit to comprehend objective phenomena, it uncovers inherent patterns and connections within data, rendering it a cornerstone technology in modern image processing and recognition tasks.

Deep learning models primarily consist of two fundamental processes: forward propagation and backpropagation. During forward propagation, input data is sequentially processed through the network’s layers, culminating in the generation of an output. In the backpropagation phase, the network parameters are optimized using the gradient descent algorithm, minimizing the loss function to enhance the model’s predictive accuracy. Each iteration involves calculating activation values and residuals for every layer, followed by updates to the weights and biases of the network. The loss function—such as mean squared error or cross-entropy loss—quantifies the deviation between the model’s predictions and the true labels, while the optimizer adjusts the parameters to expedite convergence. With successive iterations of training, the model achieves effective fitting to the training data and demonstrates robust generalization capabilities.

3.2. Lightweight Convolutional Neural Network Architecture

The model proposed in this paper adheres to the fundamental structure of a convolutional neural network (CNN) while introducing a notable enhancement: substituting the fully connected layer with a one-dimensional convolutional layer. The foundational architecture of the model is detailed in Table 1, where

K \times K / S / P

signifies the size of the convolutional and pooling windows, the stride, and the padding length of the feature map, respectively. NN denotes the number of convolutional kernels.

The model extracts features through three stages of convolutional and pooling operations, each consisting of a two-dimensional convolutional layer followed by a pooling layer. The feature extraction process of a conventional 2D convolutional layer can be expressed mathematically as

h^{(l)} = f (W^{(l)} * h^{(l - 1)} + b^{(l)}),

(7)

where

h^{(l - 1)}

is the input feature map of the l-th layer,

W^{(l)}

and

b^{(l)}

are the kernel and bias parameters, ∗ denotes convolution, and

f (\cdot)

is the nonlinear activation function.

Unlike conventional CNNs, which typically employ multiple fully connected layers after feature extraction, this approach incorporates a single layer of one-dimensional convolution to minimize the number of parameters. The 1D convolutional substitution can be formalized as

z_{j} = \sum_{k = 1}^{K} w_{j, k} x_{i + k - 1} + b_{j}, j = 1, 2, \dots, M,

(8)

where x is the flattened input vector from the last feature map,

w_{j, k}

are the 1D convolutional kernel weights of size K,

b_{j}

is the bias, and

z_{j}

is the feature output after the 1D convolution. This design choice significantly reduces the parameter count compared to traditional fully connected layers.

To mitigate overfitting, the model adopts the Dropout technique, with a dropout rate of 0.25 applied to the convolutional pooling layers and 0.5 to the fully connected layer. The pooling process facilitates dimensionality reduction of features, employing max-pooling as the preferred operation. Scratches in images manifest as distinctive texture patterns. The use of max-pooling enables the network to aggregate the most salient features within each receptive field, thereby enhancing its capacity to capture discriminative representations and global variations critical for accurate binary classification.

In traditional model design, a fully connected layer remains after the array of 2D convolutional and pooling layers. In the fully connected layer, the number of parameters is considerable, occupying the vast majority of the total number of parameters in the network, since all neurons in each layer are connected to all neurons in the next layer. For example, as shown in Table 2, the AlexNet model has about 61 M parameters, while the fully connected layer has about 59 M parameters, accounting for 96% of the total. The VGG-16 model has 138 M parameters, of which the fully connected layer has 123 M parameters, accounting for 89% of the total. This indicates a large amount of redundancy in the weights of the fully connected layer.

Convolutional layers used in the image domain usually refer to 2D convolutional layers, whose weight sharing and local connectivity properties are the very strengths of CNNs. One-dimensional convolution is mainly used to process one-dimensional signals, such as frequency signals, and serves the same purpose as two-dimensional convolution in image processing, i.e., feature extraction. In this chapter, one-dimensional convolution is introduced into image processing instead of the fully connected layer, to reduce the large number of parameters of the fully connected layer by exploiting the locally connected property of convolution.

After feature extraction, the nodes of the last feature map are rearranged into a one-dimensional vector, and then the sliding window is applied to perform the convolution operation, as in the case of the two-dimensional convolutional layer, except that it is performed only in one direction.

The corresponding formula is then changed, and after the addition, the formula is calculated as:

r_{j}^{(l)} \sim Bernoulli (p),

(9)

{\tilde{y}}^{(l)} = r^{(l)} ⊙ y^{(l)},

(10)

z_{i}^{(l + 1)} = w_{i}^{(l + 1)} {\tilde{y}}^{(l)} + b_{i}^{(l + 1)},

(11)

and

y_{i}^{(l + 1)} = f (z_{i}^{(l + 1)}),

(12)

where

Bernoulli (p)

…is the Bernoulli distribution, where during testing, all neurons in the layer are no longer dropped, and the outputs are multiplied by the ratio p so that the layer expects the total output to be consistent with the training phase. The Dropout technique is computationally simple but very effective, making it an important means of suppressing overfitting in CNNs.

Finally, the training of the network is guided by the binary cross-entropy loss, which measures the difference between the predicted labels

{\hat{y}}_{i}

and the ground-truth labels

y_{i}

. The loss function is defined as

L = - \frac{1}{I} \sum_{i = 1}^{I} [y_{i} \log ({\hat{y}}_{i}) + (1 - y_{i}) \log (1 - {\hat{y}}_{i})],

(13)

where I denotes the total number of training images. Minimizing this loss encourages the network to output predictions

{\hat{y}}_{i}

that are close to the true labels

y_{i}

, effectively training the CNN for accurate scratch detection.

3.3. Recognition Algorithm

To ensure reproducibility and provide a clear overview of the model implementation, this section presents the detailed recognition algorithm of the proposed CNN-based scratch detection framework. The complete training and testing procedures are summarized in Algorithm 1 and Algorithm 2, respectively, including data preprocessing, model optimization, and inference strategies. The source code of the proposed method is publicly available at https://github.com/liaojiaying11/ASDE (accessed on 22 October 2025).

According to the analysis, we present the recognition algorithm, as shown in Algorithm 1. Algorithm 1 outlines the complete training pipeline of the proposed CNN-based scratch detection framework. The process begins with data preprocessing. Each training sample undergoes random geometric and photometric transformations to enhance data diversity. Subsequently, normalization is applied to ensure stable gradient propagation during training. The lightweight CNN model is initialized with a one-dimensional convolutional layer that replaces the traditional fully connected layer. This modification greatly reduces the number of parameters and improves computational efficiency. During each epoch, mini-batches of augmented data are forwarded through the network to produce predictions. The binary cross-entropy loss is then calculated to measure the deviation between predictions and ground-truth labels. Model parameters are updated iteratively through gradient descent using the RMSProp optimizer, which ensures stable convergence. Over multiple training epochs, the model gradually learns discriminative features of surface scratches. Dropout regularization is applied to suppress overfitting and enhance generalization. As a result, the network achieves a strong balance between detection accuracy and lightweight deployment, making it practical for industrial inspection tasks.

Algorithm 1 Training Algorithm of the CNN-Based Scratch Detection Model for Automotive Components.

Input: Dataset $D = {(x_{i}, y_{i})}_{i = 1}^{N}$ , number of epochs E, batch size B
Output: Trained model M
Split dataset D into training subset $D_{t r a i n}$ and validation subset $D_{v a l}$
Initialize augmented training dataset $D_{t r a i n_a u g} \leftarrow Ø$
for each $(x_{i}, y_{i}) \in D_{t r a i n}$ do
$x_{i}^{a u g} \leftarrow A p p l y R a n d o m T r a n s f o r m a t i o n s (x_{i})$
$x_{i}^{n o r m} \leftarrow N o r m a l i z e I m a g e (x_{i}^{a u g})$
Append $(x_{i}^{n o r m}, y_{i})$ to $D_{t r a i n_a u g}$
end for
$D_{v a l} \leftarrow {(NormalizeImage (x), y) ∣ (x, y) \in D_{v a l}}$
Initialize lightweight CNN model M
Define loss function: $L (\hat{y}, y) = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} log ({\hat{y}}_{i}) + (1 - y_{i}) log (1 - {\hat{y}}_{i})]$
Initialize optimization algorithm $O p t$
for each epoch $e = 1, \dots, E$ do
Initialize cumulative loss $L_{t o t a l} \leftarrow 0$
for each batch $b \in C r e a t e B a t c h e s (D_{t r a i n}, B)$ do
Extract ${(x_{j}, y_{j})}_{j = 1}^{B}$ from batch b
Compute predictions ${{\hat{y}}_{j}}_{j = 1}^{B} \leftarrow M . Forward ({x_{j}}_{j = 1}^{B})$
Compute batch loss $L_{b a t c h} \leftarrow \frac{1}{B} \sum_{j = 1}^{B} L ({\hat{y}}_{j}, y_{j})$
Accumulate loss $L_{t o t a l} \leftarrow L_{t o t a l} + L_{b a t c h}$
Compute gradients $\nabla_{θ} L_{b a t c h}$
Update model parameters $M \leftarrow O p t . Update (M, \nabla_{θ} L_{b a t c h})$
end for
end for

Algorithm 2 presents the procedure for evaluating the proposed CNN-based scratch detection model on automotive components. For each test image, the algorithm first performs preprocessing by resizing the image to the network input dimensions (

112 \times 112 \times 3

) and normalizing the pixel intensities. The preprocessed image is subsequently passed through the trained lightweight CNN to generate the output probability vector. The confidence score associated with the “Scratch” class is then compared against a predefined threshold

τ

to determine the predicted label. The algorithm systematically outputs both the predicted class label and the corresponding confidence score, providing a consistent and reproducible framework for processing the testing set and assessing model performance.

Algorithm 2 Model Testing and Scratch Detection for Automotive Components

Input: Trained lightweight CNN model $M_{t r a i n e d}$ , test image $I_{t e s t}$
Output: Predicted label $L_{p r e d}$ , confidence score $P_{c o n f}$
Preprocessing:
$I_{r e s i z e d} \leftarrow Resize (I_{t e s t}, 112 \times 112)$
$I_{p r o c e s s e d} \leftarrow NormalizeImage (I_{r e s i z e d})$
Model Inference:
$P_{v e c t o r} \leftarrow M_{t r a i n e d} . Forward (I_{p r o c e s s e d})$
$P_{c o n f} \leftarrow P_{v e c t o r} [Scratch]$
Classification Decision:
if $P_{c o n f} \geq τ$ then
$L_{p r e d} \leftarrow ‘ ‘ Scratch ’ ’$
else
$L_{p r e d} \leftarrow ‘ ‘ No Scratch ’ ’$
end if
Return: $L_{p r e d}, P_{c o n f}$

4. Numerical Results

4.1. Parameters Setting

In this experiment, we utilize the PyTorch deep learning framework (version 2.5.1) for model construction and training. The experiments are conducted on an NVIDIA GeForce RTX 4090 GPU, with Python version 3.10.

To enhance the diversity of training samples and improve the model’s generalization ability, multiple data augmentation strategies are implemented. These include random scaling within the range

[0.8, 1.2]

, random rotation within

[- 15^{\circ}, 15^{\circ}]

, random flipping (horizontal or vertical) with a probability of 0.5, random translation with horizontal and vertical shifts of

\pm 10 %

, random shearing with intensity 0.2, addition of Gaussian noise with a probability of 0.2 and standard deviation 0.05, and contrast adjustment within the range

[0.5, 1.5]

. The combined application of these methods effectively simulates image features under various conditions, providing richer data for model training.

For the generative model, FastGAN [38] is selected as the primary tool due to its outstanding performance in training speed and generation quality. The generative model is trained for 50,000 iterations to ensure that the data distribution is adequately learned and high-quality synthetic images are produced. During training, the learning rates for both the generator and discriminator are set to 0.0002, with a batch size of 32.

The 2D convolutional layers employ a kernel size of

3 \times 3

, which is the minimal size capable of capturing the eight-neighborhood information of each pixel, effectively representing local feature variations in both axial and diagonal directions. Stacking multiple

3 \times 3

convolutional layers increases the receptive field without enlarging the kernel size: two stacked

3 \times 3

layers approximate a

5 \times 5

receptive field, while three stacked layers approximate a

7 \times 7

receptive field. Compared with a single large kernel, stacking small kernels introduces additional nonlinear activation functions, enhancing the network’s capacity to learn complex mappings and improving class discrimination. Moreover, using multiple small kernels reduces the number of parameters. For a layer with C input and output channels, three

3 \times 3

kernels contain

27 C^{2}

parameters, whereas a single

7 \times 7

kernel contains

49 C^{2}

parameters, improving computational efficiency while maintaining model capacity.

ReLU is used as the activation function throughout the network except for the output layer. Unlike sigmoid and tanh functions, which are computationally more complex and prone to gradient saturation in deep networks, ReLU provides simplicity and efficiency, promotes sparsity, and mitigates overfitting. The output layer employs the softmax function to suit the multi-classification task.

The loss function is the cross-entropy loss, which is widely used for multi-class classification problems due to its effectiveness in measuring the difference between predicted probability distributions and true labels.

Network optimization is performed using RMSProp [39], with hyperparameters set to an initial learning rate of 0.001, a decay rate of 0.9, and a smoothing factor of 1.0. This optimization strategy facilitates stable and efficient convergence during training.

The weights of the network are initialized using a truncated normal distribution with a mean of 0.0 and a standard deviation of 0.003. This initialization approach helps prevent vanishing or exploding gradients and ensures stable training at the beginning of optimization.

4.2. Experimental Results and Analysis

To provide a fair and meaningful comparison, we evaluate our method against several classification models, including MobileNetV2 [40], EfficientNet-Lite [41], SqueezeNet [42], Efficient-D1 [43], DERT [44], and IDD-Net [45], all trained on the same dataset for image-level scratch detection. We assess performance using standard classification metrics, namely Precision, Recall, and Accuracy, which are appropriate for this binary classification task.

Table 3 presents a comparison of several state-of-the-art models for binary scratch classification in automotive component images, evaluated using Precision, Recall, and Accuracy. The proposed model achieves the highest performance, with a Precision of 0.927, a Recall of 0.950, and an Accuracy of 0.938, demonstrating its superior capability in distinguishing scratched from non-scratched images.

The model’s lightweight CNN architecture efficiently captures fine-grained scratch features while maintaining low computational cost. By replacing conventional fully connected layers with one-dimensional convolutions, it achieves faster inference without sacrificing classification accuracy. In comparison, baseline models exhibit lower Accuracy or slower inference, confirming that the proposed approach offers an optimal balance of accuracy and efficiency for real-time scratch detection.

To select the appropriate parameters, experiments are conducted to evaluate the effect of each parameter of the 1D convolutional layer on the model accuracy. The convolution step size in 2D convolution is usually set to 1. By doing so, the relationship between any neighboring pixels in the feature map can be sensed, the feature map size remains unchanged, and the downsampling is done by the pooling layer. In contrast, the one-dimensional convolutional layer of this model does not have the property of local perception and does not need to set the setup step to 1. Setting the convolution step to the same length as the convolution window reduces the number of parameters while minimizing the number of neurons in the next layer. The convolution kernel size is the convolution window length × 1. The final accuracy of the model using different convolution windows and several convolution kernels is shown in Table 4.

From the table, it can be seen that the model accuracy is highest when the convolution window length is 8 and the number of convolution kernels is 4. Setting the parameters of the one-dimensional convolutional layer of this model as the length of the convolution window and the convolution step both to 16, and the number of convolution kernels to 6, the change curves of its recognition accuracy with the number of iterations on the training set and the test set are shown in Figure 4. It can be seen that, with the increase in the number of training times, the accuracy of the test set rises gradually and stabilizes at 85.9%, and the model’s accuracy continues to be high on the training set after iterating up to about 75 times, whereas it begins to oscillate and fall on the test set. The model accuracy on the training set continues to go up, while on the test set it starts to oscillate down, and the overfitting phenomenon occurs. Therefore, we choose to save the model parameters in step 70 as the optimal model.

Class Activation Mapping (CAM) [46] is a widely adopted visualization technique for interpreting the decision-making process of convolutional neural networks (CNNs). CAM constructs a class-specific heatmap by computing a weighted sum of the feature maps from the final convolutional layer, using the weights associated with the target class in the fully connected layer. This visualization highlights the feature regions that contribute most to the model’s prediction, thereby providing interpretable insights into the learned discriminative representations within a binary classification framework.

Qualitative analysis of CAM visualizations, shown in Figure 5, indicates that for images containing scratches, the model tends to emphasize feature regions associated with surface irregularities. This tendency demonstrates the model’s ability to capture discriminative texture cues that differentiate scratched from non-scratched images. By revealing how these feature patterns contribute to the final decision, CAM enhances the interpretability of the classification model.

In contrast, for images without scratches, the model directs its attention more toward the overall texture regions of the image. In scratch-free images, the texture features are typically more consistent, and these global texture characteristics help the model recognize the normal state of the image, classifying it as “unscratched.” By analyzing the overall texture, the model can effectively determine whether an image is in a normal state, avoiding misjudgments caused by local noise or minor variations.

This differential attention mechanism, which focuses on fine-grained features for scratched images and global texture for unscratched ones, enables the model to effectively classify images. This approach enhances classification accuracy and reliability in practical applications. By doing so, the model not only accurately identifies scratches but also maintains high stability in classifying unscratched images, ensuring the reliability of classification results.

Table 5 summarizes the comparative evaluation of fully connected (FC) layers, Global Average Pooling (GAP), and 1D convolutional layers. The FC and 1D convolutional configurations achieve nearly equivalent classification performance, with only marginal differences in precision, recall, and accuracy. Notably, the 1D convolutional design attains this performance while substantially reducing the number of trainable parameters, owing to the inherent weight-sharing mechanism along the feature dimension. In contrast, the GAP approach markedly reduces parameter count, while resulting in the loss of fine-grained spatial information and a measurable decline in classification accuracy. These observations suggest that 1D convolutions provide an optimal trade-off between model compactness and the preservation of discriminative feature representations.

As shown in Table 6, incorporating synthetic images generated by FastGAN into the training dataset leads to a substantial improvement in model performance. Compared with the model trained solely on real data, the inclusion of synthetic samples increases precision from 0.787 to 0.927, recall from 0.810 to 0.950, and accuracy from 0.796 to 0.938. These results demonstrate that the synthetic data effectively complement the limited real dataset, enhancing the model’s generalization capability and robustness in scratch recognition tasks.

This chapter proposes a method based on a convolutional neural network for recognising scratches on automotive parts, and designs a deep learning model suitable for recognising scratches on automotive parts. Due to the limited amount of training data, the algorithm needs to give full consideration to the suppression of the overfitting mechanism and improve the generalisation ability. A one-dimensional convolutional layer is used instead of the fully connected layer in the traditional CNN design, which greatly reduces the number of parameters and the model space without decreasing the accuracy, and is more suitable for the automotive hardware platform. A dropout mechanism is introduced to suppress overfitting. Self-acquisition dataset and image data expansion experiments prove that the model can effectively identify auto parts scratches.

5. Conclusions

In this paper, we have proposed an automatic surface scratch detection framework for automotive components based on convolutional neural networks. To mitigate the issue of limited annotated data, a high-quality dataset was constructed and further expanded through data augmentation and Generative Adversarial Network (GAN)-based synthesis. A lightweight CNN architecture was developed, wherein traditional fully connected layers were replaced with one-dimensional convolutional layers to reduce the number of parameters and overall model complexity. Furthermore, a Dropout mechanism is incorporated to mitigate overfitting and enhance generalization capability. Experimental results demonstrate that the proposed model delivers superior performance in surface scratch detection, effectively addressing the challenges of limited samples and model complexity. Overall, the developed system offers a practical and scalable solution for automated visual inspection in industrial manufacturing, significantly reducing reliance on manual inspection.

Author Contributions

Conceptualization, Y.Q. and G.Q.; methodology, G.Q.; software, J.L.; validation, G.Q., J.L. and K.L.; formal analysis, B.X.; investigation, K.L.; resources, K.L.; data curation, J.L.; writing—original draft preparation, G.Q.; writing—review and editing, Y.Q.; visualization, J.L.; supervision, Y.Q.; project administration, B.X.; funding acquisition, Y.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Hainan Province Science and Technology Special Fund grant number ZDYF2024GXJS292.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

Author Guojie Qu was employed by the company FAW-Volkswagen Automotive Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Yang, H.; Zheng, H.; Zhang, T. A review of artificial intelligent methods for machined surface roughness prediction. Tribol. Int. 2024, 199, 109935. [Google Scholar] [CrossRef]
Ren, Z.; Fang, F.; Yan, N.; Wu, Y. State of the art in defect detection based on machine vision. Int. J. Precis. Eng. Manuf.-Green Technol. 2022, 9, 661–691. [Google Scholar] [CrossRef]
Gao, W.; Zhang, X.; Yang, L.; Liu, H. An improved Sobel edge detection. In Proceedings of the 2010 3rd International Conference on Computer Science and Information Technology, Chengdu, China, 9–11 July 2010; IEEE: Piscataway, NJ, USA, 2010; Volume 5, pp. 67–71. [Google Scholar]
Ding, L.; Goshtasby, A. On the Canny edge detector. Pattern Recognit. 2001, 34, 721–725. [Google Scholar] [CrossRef]
Shi, T.; Kong, J.Y.; Wang, X.D.; Liu, Z.; Zheng, G. Improved Sobel algorithm for defect detection of rail surfaces with enhanced efficiency and accuracy. J. Cent. South Univ. 2016, 23, 2867–2875. [Google Scholar] [CrossRef]
Fathabadi, H. Novel filter based ANN approach for short-circuit faults detection, classification and location in power transmission lines. Int. J. Electr. Power Energy Syst. 2016, 74, 374–383. [Google Scholar] [CrossRef]
Huangpeng, Q.; Zhang, H.; Zeng, X.; Huang, W. Automatic visual defect detection using texture prior and low-rank representation. IEEE Access 2018, 6, 37965–37976. [Google Scholar] [CrossRef]
Deshpande, S.; Venugopal, V.; Kumar, M.; Anand, S. Deep learning-based image segmentation for defect detection in additive manufacturing: An overview. Int. J. Adv. Manuf. Technol. 2024, 134, 2081–2105. [Google Scholar] [CrossRef]
Qian, Y.; Rao, L.; Ma, C.; Wei, K.; Ding, M.; Shi, L. Toward efficient and secure object detection with sparse federated training over internet of vehicles. IEEE Trans. Intell. Transp. Syst. 2024, 25, 14507–14520. [Google Scholar] [CrossRef]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
He, Y.; Song, K.; Meng, Q.; Yan, Y. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Trans. Instrum. Meas. 2019, 69, 1493–1504. [Google Scholar] [CrossRef]
Zhang, D.; Hao, X.; Liang, L.; Liu, W.; Qin, C. A novel deep convolutional neural network algorithm for surface defect detection. J. Comput. Des. Eng. 2022, 9, 1616–1632. [Google Scholar] [CrossRef]
Jiang, X.; Yan, F.; Lu, Y.; Wang, K.; Guo, S.; Zhang, T.; Pang, Y.; Niu, J.; Xu, M. Joint Attention-Guided Feature Fusion Network for Saliency Detection of Surface Defects. IEEE Trans. Instrum. Meas. 2022, 71, 1–12. [Google Scholar] [CrossRef]
Li, W.; Zhang, L.; Wu, C.; Cui, Z.; Niu, C. A new lightweight deep neural network for surface scratch detection. Int. J. Adv. Manuf. Technol. 2022, 123, 1999–2015. [Google Scholar] [CrossRef] [PubMed]
Guo, Y.; Nie, G.; Gao, W.; Liao, M. 2D Semantic segmentation: Recent developments and future directions. Future Internet 2023, 15, 205. [Google Scholar] [CrossRef]
Ayala, C.; Aranda, C.; Galar, M. Guidelines to compare semantic segmentation maps at different resolutions. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–16. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Vineet, V.; Su, Z.; Du, D.; Huang, C.; Torr, P.H. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1529–1537. [Google Scholar]
Li, S.; Zhao, X.; Zhou, G. Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Comput.-Aided Civ. Infrastruct. Eng. 2019, 34, 616–634. [Google Scholar] [CrossRef]
Shi, C.; Wang, K.; Zhang, G.; Li, Z.; Zhu, C. Efficient and accurate semi-supervised semantic segmentation for industrial surface defects. Sci. Rep. 2024, 14, 21874. [Google Scholar] [CrossRef]
Beyene, D.A.; Maru, M.B.; Kim, T.; Park, S.; Park, S. Unsupervised domain adaptation-based crack segmentation using transformer network. J. Build. Eng. 2023, 80, 107889. [Google Scholar] [CrossRef]
Liu, Y.; Qin, Y.; Lin, Z.; Xia, H.; Wang, C. Detection of scratch defects on metal surfaces based on MSDD-UNet. Electronics 2024, 13, 3241. [Google Scholar] [CrossRef]
Wang, L.; Zhang, G.; Wang, W.; Chen, J.; Jiang, X.; Yuan, H.; Huang, Z. A defect detection method for industrial aluminum sheet surface based on improved YOLOv8 algorithm. Front. Phys. 2024, 12, 1419998. [Google Scholar] [CrossRef]
Zajec, P.; Rožanec, J.M.; Theodoropoulos, S.; Fontul, M.; Koehorst, E.; Fortuna, B.; Mladenić, D. Few-shot learning for defect detection in manufacturing. Int. J. Prod. Res. 2024, 62, 6979–6998. [Google Scholar] [CrossRef]
Qian, Y.; Qiu, T.; Ma, C.; Ni, Y.; Yuan, L.; Zhou, X.; Li, J. On Traffic Prediction with Knowledge-Driven Spatial-Temporal Graph Convolutional Network aided by Selected Attention Mechanism. IEEE Trans. Mach. Learn. Commun. Netw. 2025, 3, 369–380. [Google Scholar] [CrossRef]
Wang, R.; Hong, T. Few-shot defect detection in industrial scenarios: A comprehensive review of challenges, advances, and frontier trends. In Proceedings of the MATEC Web of Conferences; EDP Sciences: Ulis, France, 2025; Volume 413, p. 04005. [Google Scholar]
De Noni, L.; Marjuban, S.M.H.; Andena, L.; Noh, K.; Li, Y.; Vollenberg, P.; Sue, H.J. Effect of color on scratch and mar visibility of polymers. J. Appl. Polym. Sci. 2023, 140, e53699. [Google Scholar] [CrossRef]
Chen, Y.; Ding, Y.; Zhao, F.; Zhang, E.; Wu, Z.; Shao, L. Surface defect detection methods for industrial products: A review. Appl. Sci. 2021, 11, 7657. [Google Scholar] [CrossRef]
Yu, Z.; Wang, D.; Wu, H. Defect Detection Method for Large-Curvature and Highly Reflective Surfaces Based on Polarization Imaging and Improved YOLOv11. Photonics 2025, 12, 368. [Google Scholar] [CrossRef]
Lu, K.; Pan, X.; Mi, C.; Wang, W.; Zhang, J.; Chen, P.; Wang, B. RDDPA: Real-time Defect Detection via Pruning Algorithm on Steel Surface. ISIJ Int. 2024, 64, 1019–1028. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Qian, Y.; Yang, C.; Mei, Z.; Zhou, X.; Shi, L.; Li, J. On joint optimization of trajectory and phase shift for IRS-UAV assisted covert communication systems. IEEE Trans. Veh. Technol. 2023, 72, 12873–12883. [Google Scholar] [CrossRef]
Qian, Y.; Bai, Y.; Mei, Z.; Zhang, S.; Ni, Y.; Shi, L.; Shu, F. Adversarial Machine Learning Assisted Hybrid Chaotic Covert Communication in OFDM with Subcarrier Index Modulation. IEEE Trans. Commun. 2025, 73, 11154–11169. [Google Scholar] [CrossRef]
Zhong, J.; Liu, X.; Hsieh, C.J. Improving the speed and quality of gan by adversarial training. arXiv 2020, arXiv:2008.03364. [Google Scholar] [CrossRef]
Zou, F.; Shen, L.; Jie, Z.; Zhang, W.; Liu, W. A sufficient condition for convergences of adam and rmsprop. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11127–11135. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Koonce, B. SqueezeNet. In Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization; Springer: Berlin/Heidelberg, Germany, 2021; pp. 73–85. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
Sun, T.; Li, Z.; Xiao, X.; Guo, Z.; Ning, W.; Ding, T. Cascaded detection method for surface defects of lead frame based on high-resolution detection images. J. Manuf. Syst. 2024, 72, 180–195. [Google Scholar] [CrossRef]
Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]

Figure 1. Representative samples from the scratch detection dataset.

Figure 2. Evolution of synthetic scratch images across Generative Adversarial Network (GAN) training. The top row illustrates defect-free surface samples, whereas the bottom row depicts the corresponding samples with artificially introduced scratches. The images are arranged in order of increasing training iterations. The generator demonstrates noticeable qualitative improvements as training advances, with later epochs producing highly realistic scratches that closely replicate the morphological characteristics of actual surface defects, including texture, orientation, and local contrast.

Figure 3. The Framework of the proposed Method, which comprises two primary modules: a data enhancement module and a lightweight convolutional network. The data enhancement module improves the model’s generalization capacity for scratch images by applying a variety of preprocessing techniques. The lightweight convolutional network, optimized in its architectural design, enables efficient and accurate scratch detection.

Figure 4. Accuracy Evolution in Model Training.

Figure 5. Activation maps of the scratch detection model. The top row shows defect-free surfaces, and the bottom row shows scratched surfaces. Colors indicate activation intensity, with red representing high activation and blue representing low activation. Class activation maps highlight regions identified by the model.

Table 1. Parameter Configuration for Surface Scratch Detector.

Feature Input	Operation Type	Pooling Kernel $K \times K / S / P / N$	Feature Output	Parameter Quantity
112 × 112 × 3	Conv2D	3 × 3/1/1/32	112 × 112 × 32	896
112 × 112 × 32	MaxPooling	2 × 2/2/0/-	56 × 56 × 32	0
56 × 56 × 32	Conv2D	3 × 3/1/1/64	56 × 56 × 64	18,496
56 × 56 × 64	MaxPooling	2 × 2/2/0/-	28 × 28 × 64	0
28 × 28 × 64	Dropout (0.25)	—	28 × 28 × 64	0
28 × 28 × 64	Conv2D	3 × 3/1/1/64	28 × 28 × 64	36,928
28 × 28 × 64	MaxPooling	2 × 2/2/0/-	14 × 14 × 64	0
14 × 14 × 64	Dropout (0.25)	—	14 × 14 × 64	0
14 × 14 × 64	Reshape	—	12,544 × 1	0
12,544 × 1	Conv1D	8 × 1/8/0/4	1568 × 4	36
1568 × 4	Flatten	—	6272	0
6272	Dropout (0.5)	—	6272	0
6272	Dense	—	2	12,546

Table 2. Parameters of fully connected layers of classical CNNs.

Model	FC Layer Parameter Count	Overall Parameter	Percentag of FC layer
LeNet	59 K	62 K	95%
AlexNet	59 M	61 M	96%
VGG-16	123 M	138 M	89%

Table 3. Performance Comparison of Classification Models.

Model	Precision	Recall	Accuracy	Parameters
MobileNetV2	0.858	0.850	0.855	4.3 M
EfficientNet-Lite	0.855	0.838	0.848	4.7 M
SqueezeNet	0.817	0.822	0.819	1.24 M
Efficient-D1	0.879	0.874	0.877	6.6 M
DERT	0.863	0.888	0.874	8.5 M
IDD-net	0.899	0.930	0.913	13.1 M
Ours	0.927	0.950	0.938	0.7 M

Table 4. Effect of Convolutional Kernel Parameters on Model Accuracy. The optimal configuration, corresponding to a window length of 8 and 4 convolutional kernels, yields a maximum average precision of 0.927.

	4	8	16	32
Kernel Number	4	8	16	32
1	0.874	0.828	0.845	0.846
2	0.906	0.885	0.868	0.874
4	0.918	0.927	0.917	0.886
6	0.925	0.918	0.874	0.898
8	0.918	0.908	0.908	0.876

Table 5. Comparison of FC layers, Global Average Pooling (GAP), and 1D convolutions.

Method	Precision	Recall	Accuracy
FC layers	0.936	0.942	0.939
GAP	0.875	0.886	0.880
1D convolutions	0.927	0.950	0.938

Table 6. Performance comparison of models trained with real-only and real + synthetic data. The Real-onlymodel is trained on 1000 real images, while the Real + Synthetic model uses the same 1000 real images combined with 3000 GAN-generated samples.

Training Data	Precision	Recall	Accuracy
Real-only	0.787	0.810	0.796
Real + Synthetic	0.927	0.950	0.938

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qu, G.; Liao, J.; Liu, K.; Xu, B.; Qian, Y. Automotive Scratch Detection: A Lightweight Convolutional Network Approach Augmented by Generative Adversarial Learning. Machines 2025, 13, 1107. https://doi.org/10.3390/machines13121107

AMA Style

Qu G, Liao J, Liu K, Xu B, Qian Y. Automotive Scratch Detection: A Lightweight Convolutional Network Approach Augmented by Generative Adversarial Learning. Machines. 2025; 13(12):1107. https://doi.org/10.3390/machines13121107

Chicago/Turabian Style

Qu, Guojie, Jiaying Liao, Kai Liu, Bin Xu, and Yuwen Qian. 2025. "Automotive Scratch Detection: A Lightweight Convolutional Network Approach Augmented by Generative Adversarial Learning" Machines 13, no. 12: 1107. https://doi.org/10.3390/machines13121107

APA Style

Qu, G., Liao, J., Liu, K., Xu, B., & Qian, Y. (2025). Automotive Scratch Detection: A Lightweight Convolutional Network Approach Augmented by Generative Adversarial Learning. Machines, 13(12), 1107. https://doi.org/10.3390/machines13121107

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automotive Scratch Detection: A Lightweight Convolutional Network Approach Augmented by Generative Adversarial Learning

Abstract

1. Introduction

2. Dataset Construction and Preprocessing

2.1. Problem Statement and Dataset Establishment

2.2. Data Acquisition and Processing

2.2.1. Data Acquisition

2.2.2. Image Pre-Processing

2.3. Data Augmentation and Generation

2.3.1. Data Augmentation

2.3.2. Data Generation

3. Scratch Recognition with CNN

3.1. System Model

3.2. Lightweight Convolutional Neural Network Architecture

3.3. Recognition Algorithm

4. Numerical Results

4.1. Parameters Setting

4.2. Experimental Results and Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI