PCB-Faster-RCNN: An Improved Object Detection Algorithm for PCB Surface Defects

He, Zhige; Wu, Yuezhou; Lv, Yang; He, Yuanqing

doi:10.3390/app152412881

Open AccessArticle

PCB-Faster-RCNN: An Improved Object Detection Algorithm for PCB Surface Defects

¹

School of Computer Science and Artificial Intelligence, Civil Aviation Flight University of China, Guanghan 618307, China

²

College Office, Civil Aviation Flight University of China, Guanghan 618307, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(24), 12881; https://doi.org/10.3390/app152412881

Submission received: 24 October 2025 / Revised: 28 November 2025 / Accepted: 3 December 2025 / Published: 5 December 2025

(This article belongs to the Special Issue Deep Learning Techniques for Object Detection and Tracking)

Download

Browse Figures

Versions Notes

Abstract

As a fundamental and indispensable component of modern electronic devices, the printed circuit board (PCB) has a complex structure and highly integrated functions, with its manufacturing quality directly affecting the stability and reliability of electronic products. However, during large-scale automated PCB production, its surfaces are prone to various defects and imperfections due to uncontrollable factors, such as diverse manufacturing processes, stringent machining precision requirements, and complex production environments, which not only compromise product functionality but also pose potential safety hazards. At present, PCB defect detection in industry still predominantly relies on manual visual inspection, the efficiency and accuracy of which fall short of the automation and intelligence demands in modern electronics manufacturing. To address this issue, in this paper, we have made improvements based on the classical Faster-RCNN object detection framework. Firstly, ResNet-101 is employed to replace the conventional VGG-16 backbone, thereby enhancing the ability to perceive small objects and complex texture features. Then, we extract features from images by using deformable convolution in the backbone network to improve the model’s adaptive modeling capability for deformed objects and irregular defect regions. Finally, the Convolutional Block Attention Module is incorporated into the backbone, leveraging joint spatial and channel attention mechanisms to improve the effectiveness and discriminative power of feature representations. The experimental results demonstrate that the improved model achieves a 4.5% increase in mean average precision compared with the original Faster-RCNN. Moreover, the proposed method exhibits superior detection accuracy, robustness, and adaptability compared with mainstream object detection models, indicating strong potential for engineering applications and industrial deployment.

Keywords:

printed circuit board; Faster-RCNN; ResNet-101; deformable convolution; convolutional block attention module

1. Introduction

With the rapid iteration and continuous development of information technology, the types and functions of electronic products have become increasingly diverse [1]. As one of the most critical foundational elements in electronic devices, the PCB plays a pivotal role in providing interconnections among electronic components and enabling efficient electrical signal transmission [2]. The manufacturing quality of PCBs not only determines the overall performance, stability, and service life of electronic products but also has a direct impact on the safety of users [3,4]. However, various surface defects inevitably arise during PCB fabrication due to limitations in the current manufacturing processes, production equipment precision, and environmental conditions, including membrane corrosion, membrane oxidation, abnormal pores, wire breakage, foreign object contamination, and so on, which may lead to reduced electrical performance, signal transmission errors, and even short circuits or complete functional failures, resulting in economic losses and potential safety hazards [5,6]. Therefore, implementing high-precision real-time automated detection of PCB surface defects is not only essential for improving product quality and production efficiency but for ensuring the safety and reliability of electronic systems [7].

At present, the detection of PCB surface defects can be broadly classified into two categories: manual inspection and instrument-based inspection [8]. Manual inspection relies on human visual observation to identify PCB surface defects, with the accuracy and consistency of the results being heavily influenced by the inspector’s professional experience, level of attention, and emotional state, resulting in poor stability and inspection inconsistency [9]. In contrast, instrument-based inspection utilizes specialized optical equipment to detect defects of PCBs. However, such equipment is often structurally complex, requiring frequent adjustments to the angle and position of PCBs during the detection process to maintain accuracy, which increases operational difficulty and prolongs detection time, limiting overall efficiency [10,11]. In general, traditional detection methods rely too much on human intervention and fixed types of electrical or optical sensors, which can easily lead to instability, low efficiency, insufficient precision, and high costs, even possibly causing physical damage to the PCB itself in certain cases [12]. In addition, different detection strategies are often required for different types of defects, which can reduce universality and scalability. These limitations frequently result in poor generalization ability of detection models in real-world production scenarios, making it challenging to maintain consistent detection performance across varying production batches, manufacturing processes, and complex environmental conditions.

With the rapid development of artificial intelligence, computer vision, and image recognition technologies, deep learning-based object detection algorithms have been widely applied in various fields, including industrial manufacturing, traffic surveillance, medical image analysis, and so on [13,14]. These algorithms can easily extract features and localize objects from raw image data by leveraging the end-to-end detection concepts in deep learning, which can achieve automatic recognition and tracking of multi-class objects without human intervention while offering strong feature learning and generalization capabilities. Introducing object detection algorithms into PCB surface defect detection tasks enables automatically learning the visual characteristics of various defect types from training datasets, thereby achieving accurate predictions of defect categories and locations, which significantly reduces reliance on manual inspection, avoiding the low accuracy and inefficiency issues caused by human subjectivity [15]. Furthermore, deep learning-based object detection algorithms enable high-precision defect recognition without the use of complex and specialized detection instruments, which not only improves both detection efficiency and accuracy but also avoids potential physical damage to the PCB surface during inspection. In addition, deep learning-based models can greatly enhance adaptability and generalization across different production batches, manufacturing processes, and varying environmental conditions, providing an effective and practical pathway toward fully automated and intelligent PCB defect detection [16].

However, there are still issues in the current detection of PCB surface defects based on deep learning, as shown below.

1.: The current industrial sector lacks a precise and standardized classification system for PCB surface defects, resulting in vague and unclear categorization of defects and making it difficult to effectively and accurately identify and distinguish different types.
2.: The size of certain defects on the PCB surface is small, making them difficult to accurately detect by using traditional deep learning-based object detection algorithms, which leads to high false negative rates and poor detection accuracy, significantly affecting the overall performance of the model.
3.: Some PCB surface defects are influenced by camera angles, causing defects to deform or change their shape, making them challenging to identify accurately using traditional detection algorithms, thus compromising the overall detection rate.
4.: Some different PCB types exhibit highly similar visual features, leading to potential misclassification by deep learning models when processing these similar features, thereby reducing the recognition accuracy and robustness of the model.

To address the challenges in PCB surface defect detection, in this paper, we first organize and classify defects on the surface of PCBs based on datasets from real-world scenarios, dividing the commonly encountered PCB surface defects into eight distinct types. A high-quality dataset is constructed through detailed labeling. On the basis of this dataset, we optimize and improve the Faster-RCNN framework and propose PCB-Faster-RCNN to identify defects on PCB surfaces precisely. Firstly, ResNet-101 is employed to replace VGG-16 in the backbone network, improving the ability of the model to detect small objects, thereby reducing false negatives of small-sized defects effectively. Then, the deformable convolution module is introduced to optimize the feature extraction process, enhancing the ability of the model to detect deformed objects accurately. Finally, the CBAM is introduced into the backbone, which refines the feature extraction process by jointly focusing on spatial and channel dimensions, enhancing the ability of the model to distinguish similar objects. The contributions of this paper are as follows.

We use the real-world PCB images collected from practical manufacturing processes to construct the dataset, and the surface defects are categorized into eight distinct classes to ensure the diversity and representativeness of the samples.
The conventional VGG-16 is replaced with the deeper ResNet-101, thereby enhancing the capability of the model to perceive and detect small-scale defect objects with improved accuracy.
The deformable convolution module is incorporated to strengthen the adaptability of the model to detect irregularly shaped defects during feature extraction, enabling more effective modeling of complex defect structures.
The CBAM is integrated into the backbone network, which jointly exploits channel- and spatial attention mechanisms to achieve more fine-grained feature extraction, thereby improving the discrimination and recognition of visually similar defect categories.

The remainder of this paper is organized as follows. In Section 2, we introduce some related work. In Section 3, we introduce the detailed components of PCB-Faster-RCNN. In Section 4, we verify the performance of the proposed model through experiments. The conclusion is provided in Section 5.

2. Related Work

The traditional object detection algorithms mainly combine HOG with SVM [17,18], followed by the selective search and DPM [19,20]. Nowadays, object detection algorithms are mainly based on deep learning, such as the RCNN series [21,22,23,24,25,26], SSD [27], and YOLO series [28,29,30,31,32,33,34,35].

In the early stages of industrial defect detection, the detection of PCB surface defects mainly relied on manual visual inspection [36]. With the development of detection technology, specialized detection equipment has gradually been introduced to improve detection efficiency and accuracy [37]. More recently, the rapid advancement of information technology and deep learning-based object detection algorithms has provided new solutions for PCB defect detection, and several researchers have successfully applied these methods to practical tasks. Nguyen et al. [38] proposed a real-time automatic detection method that combines deep learning and computer vision to address common PCB manufacturing issues, such as missing components, reversed orientations, and microdefects, which combines ORB feature extraction, brute-force matching, and the RANSAC algorithm for precise image-to-template registration. In addition, it applies image differencing and thresholding to locate defect regions and employs an enhanced ResNet-50 network with focal loss to classify defect types and component orientations, thereby achieving high-accuracy real-time detection on an embedded platform. Guo et al. [39] improved YOLOv7 by first employing the K-means++ clustering algorithm to optimize anchor parameters and then introducing a receptive field enhancement (RFE) module into the network head and replacing the CIoU loss function with WIoUv2. At the same time, the Triplet attention mechanism was introduced in CBS and SPPCSPC modules. The improved model demonstrated superior detection accuracy and speed compared to the baseline model. Based on YOLOv8, Yuan et al. [40] proposed LW-YOLO to address the challenges of slow detection and limited accuracy in existing approaches. The model integrates a bidirectional feature pyramid network for multi-scale feature fusion, utilizes partial convolution modules to reduce redundant computations, and adopts a minimum point distance intersection-over-union loss function to simplify optimization while enhancing accuracy. The experimental results confirmed that LW-YOLO significantly improves both efficiency and accuracy in PCB defect detection. Chen et al. [41] introduced a defect re-inspection mechanism based on deep learning, which combines two independently trained models with basic recognition capabilities into a primary model to improve defect classification accuracy. J. Kim et al. [42] developed an advanced PCB inspection system based on a skip-connected convolutional autoencoder. The system trains a deep autoencoder to reconstruct defect-free images from defective ones and then compares the reconstructed images with the input images to localize defects. Additionally, data augmentation was employed to enhance training performance, and the results demonstrated high detection accuracy.

In summary, although significant progress has been made in recent years with the application of deep learning algorithms to PCB surface defect detection, several challenges remain. To address these issues, in this paper, we introduce a series of improvements to the Faster-RCNN object detection framework to improve both the accuracy and efficiency of PCB defect detection.

3. Methods

In this section, we first present PCB-Faster-RCNN. Then, we introduce the various improved modules of Faster-RCNN, including ResNet-101 as the backbone, the deformable convolution module employed to enhance the adaptability of the model to geometric variations, and the CBAM, which is integrated to strengthen feature representation and improve discriminative performance.

3.1. The Overview of PCB-Faster-RCNN

Figure 1 illustrates the overall architecture of the proposed PCB-Faster-RCNN in this paper. In this framework, ResNet-101 replaces the conventional VGG-16 as the backbone, thereby enhancing the capability of deep feature extraction. The backbone is composed of five residual blocks, each block consisting of two convolutional layers, one batch normalization layer, and one nonlinear activation function, which can effectively mitigate the problem of the gradient vanishing while improving feature expression ability. Deformable convolution is introduced to enhance the adaptability of the model to deformed objects and irregular defect areas during the feature extraction process. Meanwhile, the CBAM is integrated into the backbone network to achieve fine-grained feature selection and enhancement by joint modeling of channel attention and spatial attention, and then the features output by each residual block are input into the feature pyramid, which effectively combines high-level semantic information with low-level spatial details through a top-down multi-scale fusion mechanism, thereby improving the model’s feature representation capability at different scales. The fused multi-scale feature maps are subsequently passed to the Region Proposal Network (RPN) to generate candidate regions. Finally, candidate regions are processed through RoI operations and fully connected layers, where bounding box regression is performed using the L1 loss function and object classification is achieved via a Softmax classifier. The final output includes both the category and precise location of each defect. By integrating residual structures, deformable convolutions, CBAM, and multi-scale feature fusion, the proposed architecture significantly improves accuracy and robustness in complex PCB defect detection scenarios.

3.2. Residual Neural Network

The backbone network is typically VGG-16 in the traditional Faster-RCNN framework. As a classical convolutional neural network architecture, VGG-16 achieved remarkable performance in early object detection and image recognition tasks, laying the foundation for subsequent research on deep convolutional networks. However, the structural design of VGG-16 primarily relies on stacking of deep 3 × 3 convolutional kernels and fully connected layers to enhance its detection accuracy, lacking cross-layer information transmission mechanisms, which can make the deep features prone to gradient vanishing or exploding during training, thereby constraining its scalability and hindering further network deepening. In addition, although VGG-16 demonstrates reasonable performance in extracting global semantic features, its ability to handle small-scale, morphologically complex, or boundary-ambiguous objects remains insufficient. This limitation is particularly evident in PCB defect detection, where the network struggles to accurately represent small defects or irregularly shaped objects, leading to a noticeable decline in detection accuracy. To address these issues, in this paper, we replace the original VGG-16 backbone in Faster-RCNN with ResNet-101 [43]. By introducing residual learning and shortcut connections, ResNet-101 effectively alleviates the degradation problem commonly encountered in deep neural network training while simultaneously improving the stability of feature propagation and gradient flow. Moreover, its deeper residual structure facilitates the extraction of multi-scale features and enhances adaptability to complex deformation patterns. As a result, the proposed backbone substitution significantly improves detection precision and robustness, particularly in handling diverse defect categories on PCB surfaces. The residual neural network consists of several residual blocks. The overall structure of the residual block is illustrated in Figure 2.

As we can see in Figure 2, each residual block consists of two convolutional layers and an ReLU activation function. The input X is processed through two different paths after entering the residual block. The first path passes through two convolutional layers and an ReLU activation function to produce the output

F (x)

, while the second path directly transmits the input X as the output through a residual connection. The outputs of the two paths will eventually be added together, and the final output F (X) + X of the residual block is obtained after passing through the ReLU activation function, which is mathematically expressed as Equation (1).

Y = F (X, W) + X,

(1)

In the residual block, X denotes the input to the block, W represents the weights of the learnable layers,

F (W, X)

is the output obtained after processing through these layers and activation functions, and Y is the final output of the residual block. The core idea of residual networks is the introduction of residual learning through the residual connection. The traditional neural network learns the mapping from input x to output y, which is

y = f (x)

, while ResNet introduces a residual function

F (X, W)

, where the network learns the difference between the input and the output. According to Equation (1), the network does not directly learn the mapping

y = f (x)

but instead learns the “residual”

F (X, W)

between the input x and the output y.

There are several variants of residual neural networks with different numbers of layers, including ResNet-50, ResNet-101, ResNet-152, and so on, where the number represents the depth of the network. Although these different versions of residual networks exhibit significant improvements in precision, accuracy, and generalization ability compared to traditional networks, there are still notable differences between them. Specifically, ResNet-50 offers advantages in computational efficiency but has relatively limited precision in object detection tasks due to its shallower architecture. On the other hand, ResNet-152 achieves higher precision with its deeper architecture but comes with increased computational cost and a higher risk of overfitting. Therefore, ResNet-101 was selected as the backbone for Faster-RCNN to balance both accuracy and computational efficiency after a comprehensive evaluation of the dataset used in this paper.

In this paper, we present a network architecture based on five residual modules, aiming at improving the ability of the model to recognize objects at multiple scales through feature maps of varying sizes. Each residual module extracts feature information from objects of different sizes, and the feature maps become more abstract and richer in semantic information as the network deepens. In the lower layers, feature maps capture more low-level details, such as edges and textures, while higher layers focus on capturing semantic information, such as object categories and structures. To enhance the expressiveness of the features, the feature maps from different scales are fused through a feature pyramid network. The feature pyramid network effectively integrates multi-level features, improving recognition accuracy across objects of various sizes. Through this multi-scale feature fusion, the model demonstrates improved robustness and accuracy in detecting objects of different sizes, thereby enhancing overall performance. Figure 3 illustrates the implementation of this feature fusion process.

ResNet-101 enables efficient learning and propagation of features at deeper layers by introducing residual connections. This capability of deep feature extraction is particularly critical for small objects with limited pixel information as the network must capture fine-grained shape and texture details from low-level features. Residual blocks play a key role in cross-layer information transmission, not only enhancing the ability of the model to model detailed features but also improving its sensitivity to object details, thereby enabling reliable recognition of small objects in complex backgrounds. Complementarily, the feature pyramid network enhances multi-scale feature representation by extracting and integrating features across multiple scales, allowing the network to effectively leverage information from different levels and achieve improved accuracy and robustness in object detection tasks. In PCB defect detection, some types of defects are inherently small in size. By integrating ResNet-101 with a feature pyramid network, the model can efficiently extract discriminative features across multiple scales, thereby improving its sensitivity to objects of varying sizes and strengthening its capability to detect small-scale defects. This multi-scale feature representation and fusion mechanism enables the model to achieve more robust and accurate defect recognition in complex inspection scenes.

3.3. Deformable Convolution Network

Traditional convolution has a relatively limited receptive field due to the fixed sampling position of the convolution kernel. Moreover, pooling operations reduce the size of the feature map, inevitably resulting in the loss of some image information. This issue is particularly prominent in images with object deformation and may even result in severe model degradation in some extreme cases. To address the limitation, in this paper, we introduce deformable convolution [44] for feature extraction, which allows the convolution kernel to flexibly select sampling points from neighboring pixels by incorporating learnable offsets into the sampling locations, thereby enhancing its ability to model objects with geometric deformations and scale variations. The deformable convolution can capture richer structural and semantic information from the image compared to conventional convolution, thus improving the robustness and representational capacity of the model in complex visual tasks. Figure 4 illustrates examples of deformable convolution.

As illustrated in Figure 4, traditional convolution extracts features from fixed sampling locations, and its computational formulation is shown in Equation (2). In contrast, deformable convolution introduces the offset, which takes the sampling point as the center and adaptively selects the sampling position within its surrounding neighborhood range by adding an offset to enable more flexible feature extraction. Both the offset and sampling position are randomly generated rather than predefined, thereby allowing the convolution to capture richer geometric structures and semantic information from the image. The corresponding computational process is expressed in Equation (3).

y (p_{0}) = \sum_{p_{n} \in R} w (p_{n}) \cdot x (p_{0} + p_{n}) .

(2)

y (p_{0}) = \sum_{p_{n} \in R} w (p_{n}) \cdot x (p_{0} + p_{n} + Δ p_{n}) .

(3)

In Equations (2) and (3), x denotes the input feature map,

w (p_{n})

represents the convolution kernel weight at position

p_{n}

, and

p_{0}

is the central sampling point. The set of sampling locations of the convolution kernel is denoted as

R

, while

Δ p_{n}

refers to the offset, which is typically randomly generated and can take non-integer values. In conventional convolution, the input features are aggregated through weighted summation at fixed sampling locations. In contrast, deformable convolution introduces the offset

Δ p_{n}

at each sampling position

p_{n}

, thereby shifting the sampling location to

p_{0} + p_{n} + Δ p_{n}

, which makes the convolution kernel no longer constrained to a fixed regular grid; it can adaptively select sampling locations, enabling more flexible and informative feature extraction.

The PCB image data we used in this paper was collected from real-world manufacturing scenarios, where the quality of images is inevitably affected by factors such as environmental conditions and camera angles, which often cause issues such as object deformation, insufficient illumination, background reflections, and blurred details, thereby increasing the difficulty of defect detection. To address these challenges, DCNs are modularly embedded into the high-level feature extraction stage of ResNet to enhance the adaptability of the model to target geometric deformations and complex scenes. Specifically, we replace all traditional convolutions with the DCN in Res4, which is responsible for encoding high-level semantic information. Simultaneously, we introduce the DCN into the first traditional convolutional layer of Res5, enabling the model to have adaptive sampling capabilities in semantically rich but spatially low-resolution stages. DCNs can adjust the convolution sampling position through offsets, allowing the convolution kernels to more flexibly align features with scale-changed objects. This embedding method does not change the overall topology of the backbone but significantly enhances the spatial modeling capabilities, making it more suitable for the complex appearance changes of objects such as PCB manufacturing scenarios. To further validate its effectiveness, the ablation experiment based on conventional convolution was conducted, allowing for a systematic comparison of their performance differences.

3.4. Convolutional Block Attention Module

Traditional attention mechanisms are generally restricted to a single dimension, either channel attention or spatial attention. Such designs inevitably suffer from inherent limitations in feature representation. On the one hand, channel attention can effectively emphasize informative feature channels but often neglects spatial positional information, making it difficult to accurately localize salient regions in complex visual scenes. On the other hand, spatial attention is capable of highlighting crucial locations, yet it fails to capture semantic interdependencies across channels, thereby resulting in incomplete and suboptimal feature descriptions. In contrast, the Convolutional Block Attention Module (CBAM) [45] introduces a sequential integration of channel- and spatial attention. By first refining channel-wise feature responses and subsequently enhancing spatially significant regions, CBAM establishes a complementary relationship between global semantic modeling and local spatial localization. This joint mechanism enables a more holistic and discriminative feature refinement process, effectively compensating for the deficiencies of conventional single-dimensional attention mechanisms. Furthermore, unlike computationally intensive self-attention methods, CBAM requires only minimal additional operations due to its lightweight and modular design, thereby maintaining low computational overhead and high scalability, which allows CBAM to be seamlessly embedded into a wide range of convolutional neural network architectures as a plug-and-play module. Based on this, in this paper, we incorporate CBAM into the backbone to enhance feature representation. Through the joint exploitation of channel dependencies and spatial importance, the model is able to extract more fine-grained and multidimensional representations from input images. Consequently, the proposed framework exhibits improved discriminability when dealing with visually similar targets under complex and cluttered conditions. The overall architecture of CBAM is illustrated in Figure 5.

As illustrated in Figure 5, CBAM is composed of two sequential sub-modules, including the channel attention module (CAM) and the spatial attention module (SAM). The input feature map is first processed by CAM, where channel-wise attention is applied to selectively emphasize informative feature channels with higher semantic relevance, and then the feature map with the channel attention is fed into SAM, where spatial attention is imposed to highlight salient regions across spatial dimensions. Through this cascaded operation, the final output feature map effectively integrates both channel- and spatial attention, thereby yielding a more comprehensive and discriminative feature representation.

For CAM, it computes complementary channel descriptors via global average and max pooling and projects them through a shared bottleneck MLP, fusing them with a sigmoid function; then, the feature map will be rescaled channel-wise to emphasize semantically informative channels. Specifically, for the input feature map

F \in R^{C \times H \times W}

, the CAM first performs two types of pooling operations along the spatial dimension, thereby generating complementary channel descriptors. The formulations of these pooling operations are presented in Equations (4) and (5).

z_{a v g} (c) = \frac{1}{H W} \sum_{h, w} F (c, h, w) .

(4)

z_{m a x} (c) = \underset{h, w}{m a x} F (c, h, w) .

(5)

Specifically, global average pooling

z_{a v g} (c)

computes the mean activation across all spatial locations within each channel to characterize the overall response strength, while the max pooling

z_{a v g} (c)

extracts the maximum activation value along the spatial dimension to capture the most salient local feature. The combination of these two descriptors provides a complementary global representation for each channel, encompassing both statistical characteristics and salient responses. Subsequently, the two channel descriptors are fed into a shared-weight two-layer multilayer perceptron (MLP) to further model inter-channel dependencies, shown in Equations (6) and (7).

s_{a v g} = W_{2} δ (W_{1} z_{a v g}) .

(6)

s_{m a x} = W_{2} δ (W_{1} z_{m a x}) .

(7)

where the first weight layer will reduce the channel dimension from C to C/r where

r

denotes the reduction ratio, which not only decreases the number of parameters and computational cost but also compresses the channel representation to extract more discriminative information. Subsequently, the second weight layer restores the dimension to

C

, thereby generating a weight distribution that matches the original channel number, and the nonlinear activation function

δ

, typically ReLU, is introduced to enhance the nonlinearity and expressiveness of the learned features during this process. Finally, two nonlinear mapping vectors are produced, corresponding to the average pooling and max pooling branches, respectively. These two outputs are then fed into a gating mechanism to integrate complementary information and adaptively assign channel-wise weights, as shown in Equation (8).

\begin{matrix} \begin{matrix} m_{c} = σ (s_{a v g} + s_{max}) \in R^{C \times 1 \times 1} . \end{matrix} \end{matrix}

(8)

In the gating unit, the outputs from the average pooling and max pooling branches are first combined via element-wise addition. Then, the aggregated result is normalized by a sigmoid activation function

σ

, producing a channel attention vector

m_{c}

of length C, which characterizes the importance coefficients of individual channels with values constrained within the range of 0 to 1. Finally, the weights

m_{c} (c)

are applied to the input feature map F through channel-wise multiplication, obtaining the feature map

F^{'} (c, h, w)

with channel attention, as shown in Equation (9):

F^{'} (c, h, w) = m_{c} (c) \otimes F (c, h, w) .

(9)

After processing by CAM,

F^{'}

is fed into SAM, which derives complementary spatial descriptors via channel-wise average and max pooling, aggregates them using a single

7 \times 7

convolution followed by a sigmoid gate to form a spatial attention map, and re-weights the feature map pixel-wise to highlight salient regions while suppressing irrelevant backgrounds. The SAM learns the spatial attention map

M_{s} \in R^{1 \times H \times W}

, which adaptively recalibrates features along the spatial dimension to emphasize salient regions and suppress irrelevant background information. Specifically, the feature map

F^{'} (c, h, w)

is subjected to average pooling and max pooling along the channel dimension, resulting in two complementary two-dimensional response maps, as defined in Equations (10) and (11).

M_{a v g} (h, w) = \frac{1}{C} \sum_{c = 1}^{C} F^{'} (c, h, w) .

(10)

M_{max} (h, w) = max_{1 \leq c \leq C} \sum_{c = 1}^{C} F^{'} (c, h, w) .

(11)

where

M_{a v g}

denotes the mean value across all channels at spatial location

(h, w)

, reflecting the overall response intensity at that position, while

M_{m a x}

extracts the maximum activation among all channels, highlighting the most salient local pattern. These two descriptors provide complementary spatial representations, with the former preserving global statistical information of the background and the latter emphasizing discriminative local regions. Subsequently, the two response maps are concatenated along the channel dimension to form the composite representation

[M_{a v g}; M_{max}] \in R^{2 \times H \times W}

, which is then processed by a single convolutional layer for feature fusion. Finally, the output is normalized via a sigmoid function to generate the spatial attention map, as shown in Equation (12).

M_{s} = σ (f^{7 \times 7} ([M_{a v g}; M_{max}])), M_{s} \in R^{1 \times H \times W} .

(12)

The convolution operation can fuse the two spatial descriptors, enabling the extraction of contextual features within local neighborhoods, which makes the attention map not only rely on pointwise statistical information but also capture spatial dependencies across adjacent regions. The output is subsequently normalized by a sigmoid activation function

σ

, which compresses the values into the range of 0 to 1, thereby producing the final spatial attention map

M_{s}

that reflects the importance of each spatial location, and then attention map is multiplied element-wise with the channel-refined feature map, allowing the network to enhance salient regions while suppressing background and noise interference and obtaining the final refined feature map

F^{''} (c, h, w) \in R^{C \times H \times W}

, shown in Equation (13).

F^{''} (c, h, w) = M_{s} (h, w) \otimes F^{'} (c, h, w) .

(13)

In the real-world PCB manufacturing scene, various types of defects often exhibit visually similar characteristics due to environmental interference and imaging limitations of the cameras, which increases recognition ambiguity and degrades model performance. To address this, the CBAM module is integrated into the multi-scale feature extraction process of the backbone network in a lightweight and embedded manner. Specifically, we insert CBAM modules at the output of each module in the ResNet backbone, ensuring that each level of features undergoes channel recalibration before entering the next stage of convolutional computation. Specifically, the CAM models global features along the channel dimension and adaptively learns the relative importance of each channel, thereby emphasizing discriminative feature channels that are critical for distinguishing visually similar defects; meanwhile, the SAM redistributes feature weights across the spatial dimension, highlighting salient locations while suppressing irrelevant regions, which enables the network to focus on key internal structures of the defects rather than relying solely on overall contours, thus mitigating confusion caused by morphological similarity. The synergy of CAM and SAM endows the network with discriminative advantages in both feature selection and spatial localization, leading to significant improvements in accuracy and robustness for distinguishing between similar defect categories. The embedding method does not change the original structure of the backbone network but can effectively enhance the salient regions of the target and suppress redundant background information in the low and middle layers of feature extraction, thereby improving the feature representation ability of the overall detection network and its robustness in PCB manufacturing scenarios.

3.5. CIoU Loss Function

The CIoU loss incorporates the ratio between the ground truth and the predicted bounding boxes, enabling a more comprehensive evaluation of localization accuracy, shown in Equations (14) and (15).

C I o U = I O U - \frac{d^{2}}{c^{2}} - α υ .

(14)

α = \frac{υ}{(1 - I o U) + υ} .

(15)

In Equations (14) and (15), d represents the distance between the predicted boxes and center point of the ground-truth bounding boxes. c represents the distance between the diagonals of the minimum bounding matrix.

υ

represents the ratio of the two boxes, shown in Equation (16). The final function of CIoU loss is shown in Equation (17).

α = \frac{4 \cdot {(arctan \frac{ω_{G}}{h_{G}} - arctan \frac{ω_{p}}{h_{p}})}^{2}}{π^{2}}

(16)

α = 1 - I O U + \frac{d^{2}}{c^{2}} + α υ .

(17)

Compared with other loss functions, the CIoU loss introduces

υ

, which enhances robustness in object detection tasks, which not only improves detection accuracy but also strengthens the generalization capability of the model under complex conditions. Given that PCB manufacturing environments are often affected by diverse and challenging factors, in this paper, we adopt CIoU loss as the primary regression function to ensure stability and reliability in practical applications.

In this paper, the introduction of ResNet-101, DCN, and CBAM represents a systematic architectural design tailored to the characteristics of PCB defect detection tasks. PCB defects are typically extremely small, texture-fragmented, geometrically irregular, and highly similar across categories, which makes conventional convolutional features insufficient for semantic abstraction, deformation modeling, and background suppression. ResNet-101 provides deeper semantic representations that strengthen the ability of the model to capture micro-scale defects. DCN adapts to the geometric variations in defect shapes through learnable sampling offsets, significantly reducing false negatives caused by edge distortion and structural deformation. CBAM effectively highlights key regions and suppresses complex PCB texture backgrounds through a joint channel- and spatial attention mechanism, reducing misclassification between similar-looking categories. These components complement each other functionally. ResNet-101 enhances global semantics, DCN captures local geometric details, and CBAM achieves discriminative region attention. These three correspond to three complementary dimensions: semantic enhancement, deformation modeling, and attention guidance, forming a collaborative optimization mechanism tailored to the characteristics of PCB defects.

4. Results

In this section, we will provide a comprehensive description of the experimental process and result analysis, including the construction and characteristics of the self-built dataset, the configuration of the experimental environment, the definition of performance evaluation metrics, as well as the results of comparative and ablation experiments.

4.1. Dataset

The dataset constructed in this paper was collected from the real-world PCB manufacturing scene. Specifically, high-speed industrial cameras were employed to capture images of PCB surfaces along the production line, thereby preserving high-resolution and detailed structural information. Then, a large number of collected images were carefully screened, extracting images containing defects to form a dataset encompassing multiple defect categories. Compared with publicly available synthetic datasets or small-scale collections obtained under laboratory conditions, the dataset can reflect the complexities of real manufacturing environments more accurately. As a result, the constructed dataset not only provides high-quality input for model training but also establishes a solid foundation for evaluating the effectiveness and robustness of defect detection methods in real-world applications. The specific parameters for image acquisition are shown in Table 1.

Due to the complexity of production environments and the variability of equipment operating conditions, these defects exhibit substantial heterogeneity in their morphological characteristics, which not only increases the difficulty of defect detection and classification but also imposes stricter requirements on the robustness and generalization ability of detection models. To ensure precise identification of these defects, in this paper, we have systematically organized and categorized a large number of defect images, and the defects were ultimately classified into eight categories: film anomaly, film oxidation, film peeling, hole anomaly, breakage, foreign matter on the film, sub-film foreign matter, and oil contamination. Examples of each defect are shown in Figure 6.

As we can see in Figure 6, different types of defects exhibit significant variations at the macroscopic level, while certain detailed features display partial similarity with the visual inspection. The detailed appearance characteristics of each defect are described below.

The defect of film anomaly is primarily induced by variations in environmental temperature and pressure during the PCB manufacturing process, which result in the rupture of the surface film layer. The morphological characteristics of such defects exhibit considerable variability; in some cases, they appear as minor localized ruptures, while in others they extend over large areas, even covering the entire image. Their color distribution typically follows a gradient pattern, gradually darkening from the edges toward the center, forming a transition from light to deep black. Moreover, iridescent halo effects resembling rainbow-like fringes are frequently observed along the defect boundaries. These defects are uniformly denoted by the code “FB”.

The defect of film oxidation is primarily caused by the incorporation of oxygen in varying proportions during the soldering process of PCB conductors, leading to localized oxidation and corrosion on the film surface. These defects typically occur at the soldering joints between conductors on the panel and are often characterized by small dot-like structures, usually comparable in size to the width of the conductor. The defect predominantly appears black in color with faint halo effects frequently observed around the periphery. For consistency in subsequent analysis and classification, these defects are uniformly denoted by the code “FO”.

The defect of film peeling primarily arises from improper temperature control during the soldering stage of PCB manufacturing, leading to a temperature difference between the film and the copper substrate, which causes localized detachment of the film at structurally weaker positions, resulting in separation between the film and the copper layer. These defects are most commonly observed near the connecting conductors of the panel and typically exhibit concentric ring patterns, with the surrounding region often appearing as a translucent black band with distinct edges. The size and shape of the defect are generally visible to the naked eye. For the purposes of systematic classification and identification, these defects are uniformly denoted by the code “FP”.

The defect of hole anomaly primarily results from improper operations during the drilling process in PCB manufacturing. Specifically, excessive spindle deflection, improper handling of CNC drilling, insufficient drill bit speed, overly high feed rates, excessive spindle depth, and drill bit overuse can all contribute to abnormalities in conductor connection holes. These defects are typically located at the junction between the metal substrate and the conductors and often appear as relatively small regions in the images, making them difficult to detect with the naked eye. Compared with normal holes, hole anomalies are commonly manifested as missing holes, irregular shapes, or distorted contours. For consistency in classification and subsequent analysis, these defects are uniformly denoted by the code “HD”.

The defect of breakage is primarily caused by improper control of current during the electroplating process. Both excessive and insufficient current levels can induce wire fractures. Improper handling during production and minor errors in etching or exposure processes may also contribute to the occurrence of such defects. These anomalies typically appear along conductor traces in the images and are usually characterized by small and opaque black patches with irregular shapes. Due to their limited size and high similarity to background noise, they are difficult to identify through manual inspection. For consistency in classification and further analysis, these defects are uniformly denoted by the code “NP”.

The defect of foreign matter on the film primarily arises from sealing anomalies during the PCB manufacturing process, which allow dust or machining debris to settle on the panel surface. Its visual characteristics typically appear as transparent or semi-transparent black impurities, covering a relatively large area and being clearly discernible to the naked eye. In some cases, they may even extend across a significant portion of the image, thereby introducing substantial challenges for product quality and subsequent inspection. For consistency in classification and further analysis, these defects are uniformly denoted by the code “PN”.

The defect of sub-film foreign matter typically occurs during the process of film coating over the metallic substrate, where improper operation or inadequate sealing causes impurities or machining debris to become trapped between the film layer and the metal surface. Its visual manifestation is generally characterized by opaque black spots, which in some cases may be difficult to identify with the naked eye. The defects share certain visual similarities with wire breakage defects, but breakage defects are typically confined to wiring regions, whereas sub-film foreign object defects may appear at arbitrary locations across the image, with the majority observed in the vicinity of the metallic substrate. For consistency in classification and further analysis, these defects are uniformly denoted by the code “PI”.

The defect of oil contamination primarily arises during the PCB manufacturing process as a result of oil leakage from machining equipment. Its appearance is usually characterized by a semi-transparent small black dot with low overall contrast, but it is still clearly visible in the image. Their spatial distribution is not fixed, and they may occur in any region of the image, although they are most frequently observed in the vicinity of metallic connection panels. Due to their morphological similarity to certain types of foreign matter defects, oil contamination defects pose additional challenges for accurate detection and classification. For consistency in classification and further analysis, these defects are uniformly denoted by the code “XO”.

Overall, based on the systematic definitions outlined above, we constructed the training dataset according to eight distinct defect categories. The distribution and statistical details of each defect category are presented in Table 2.

Table 2 presents the defect categories along with the corresponding training sample quantities. Each defect category contains approximately 1200 images, which are sufficient to support the research objectives of this paper. Due to high-speed camera limitations and other environmental interferences, a small subset of images across different defect categories exhibits issues, such as blurriness or indistinct defect features, which must be carefully screened and processed to ensure data consistency and reliability. All images in the dataset were annotated using LabelImg, obtaining a finalized high-quality dataset.

4.2. The Implementation Details

The setup of the experimental environment is detailed in Table 3. We conducted all experiments on the Ubuntu 22.4 operating system, which provides a reliable and stable platform for deep learning frameworks, and then the Python 3.11.9 was selected as the primary programming language due to its efficient data processing capabilities and extensive library ecosystem, making it particularly well-suited for object detection tasks. Moreover, NumPy 1.26.3 and PyTorch 2.4.1 were employed to enable efficient matrix operations and deep model training. Hardware acceleration was supported by CUDA 12.4 in conjunction with an NVIDIA GeForce RTX 4090 GPU equipped with 24GB of memory, which significantly enhances parallel computation efficiency for large-scale data and complex models, while the CPU is an Intel Core i9-13900K@3.00GHz and the memory is 64GB, ensuring smooth and efficient execution of all experimental procedures.

Table 4 summarizes the key hyperparameter configurations employed for model training in this paper. Specifically, the initial momentum was set to 0.8, and a learning rate warm-up strategy was applied for three consecutive epochs to gradually increase the learning rate to its predefined value, thereby mitigating the adverse effects of abrupt changes on the optimization process. Moreover, a cosine annealing strategy was adopted to dynamically adjust the learning rate during the formal training phase, ensuring smooth decay throughout training and enhancing both convergence stability and overall model performance.

4.3. The Evaluation Metrics

In this paper, we employed several widely used evaluation metrics in object detection to comprehensively evaluate the performance of PCB-Faster-RCNN, including precision, recall, AP, mAP, and GFLOPs.

Specifically, precision was adopted to measure the correctness of predictions, and recall was utilized to evaluate the ability of model to identify all relevant objects. Their formulas are shown in Equations (18) and (19), respectively.

P r e c i s i o n = \frac{T P}{T P + F P} .

(18)

R e c a l l = \frac{T P}{T P + F N} .

(19)

where

T P

represents the correctly detected positive objects,

F P

refers to negative instances that are incorrectly classified as positive, and

F N

represents positive objects that are missed by the detector. In the task of object detection, a trade-off typically exists between precision and recall. It may fail to identify some true objects when the model emphasizes high precision, thereby increasing the number of

F N

and reducing recall. Conversely, when the model prioritizes higher recall, the decision threshold tends to be relaxed, leading to more

F P

and consequently a reduction in precision.

To achieve a balance between precision and recall while maximizing overall detection performance, researchers have introduced average precision (AP) as a fundamental evaluation metric in object detection. Specifically, the precision–recall (P–R) curve can be plotted by computing precision and recall under varying threshold conditions. The AP value is defined as the area under the curve, thereby providing a comprehensive measure of the detection capability of the model across thresholds. Furthermore, to assess the overall performance in multi-class detection tasks, the arithmetic mean of the AP values across all categories is calculated, obtaining the mean average precision (mAP), shown in Equation (20).

m A P = \frac{1}{N} \sum_{i = 1}^{n} A P_{i} = \frac{1}{N} \sum_{i = 1}^{n} \int_{0}^{1} P (R) d R .

(20)

where N denotes the total number of categories in the object detection task, and

A P_{i}

represents the average precision (AP) of the

i - t h

category. The use of mAP enables a comprehensive evaluation of model performance by jointly considering precision and recall, thereby overcoming the limitations of relying on a single metric. During the experimental process, different Intersection over Union (IoU) thresholds directly affect the computed mAP values. Specifically, the IoU measures the degree of overlap between predicted and ground-truth bounding boxes. Its formulation is provided in Equation (21). To ensure a more objective and accurate assessment of detection performance, we calculate the mAP across multiple IoU thresholds, obtaining the final mAP by averaging these values, as expressed in Equation (22).

I o U = \frac{A r e a (B_{p} \cap B_{g t})}{A r e a (B_{p} \cup B_{g t})} .

(21)

m A P_{a v g} = \frac{\sum_{i = 0.5}^{i = 0.95} m A P_{i}}{10}, I o U = [0.5 : 0.95 : 0.05] .

(22)

where

m A P_{i}

represents the AP value with the IoU threshold of i. The performance of the model can be evaluated more accurately in this way compared to using the single IoU threshold.

GFLOPs also serve as an important evaluation metric in object detection tasks. As a variant of FLOPs, 1 GFLOP corresponds to

10^{9}

floating-point operations, which is commonly employed to assess the computational complexity of a model, reflecting the total number of floating-point operations required during a single forward pass. For convolutional layers, the computational complexity can be formulated as shown in Equation (23).

G F L O P s = \frac{2 \times H_{o u t} \times W_{o u t} \times C_{o u t} \times (K_{h} \times K_{w} \times C_{i n})}{10^{9}} .

(23)

where

H_{o u t}

and

W_{o u t}

represent the height and width of the output feature map, respectively,

C_{i n}

and

C_{o u t}

represent the number of channels of input and output, respectively, and

K_{h}

and

K_{w}

represent the height and width of the convolution kernel, respectively. Due to each convolution operation involving one multiplication and one addition, the original computation should be multiplied by 2. For the fully connected layers, the computational complexity of GFLOPs is shown in Equation (24).

G F L O P s_{F C} = \frac{2 \times N_{i n} \times N_{o u t}}{10^{9}} .

(24)

In object detection tasks, the total GFLOPs are generally defined as the cumulative computational complexity of all network layers. Through these evaluation metrics, we can better evaluate the performance of the model proposed in this article.

4.4. The Result of the Experiment Compared with the Baseline Model

During the process of experiments, we first systematically compared PCB-Faster-RCNN with the baseline model on the self-constructed dataset. The evaluation metrics included a set of widely adopted indicators in object detection tasks, such as precision, recall, APs, APm, APl, and mAP, ensuring a comprehensive and objective assessment. To make the convergence more stable, Faster-RCNN with CIoU loss function was employed as the baseline for comparison. The detailed experimental results are presented in Table 5.

As we can see in Table 5, the proposed PCB-Faster-RCNN in this paper demonstrates improvements across all evaluation metrics compared to the baseline Faster-RCNN, validating the effectiveness of PCB-Faster-RCNN. Specifically, the accuracy and recall of PCB-Faster-RCNN increased by 3.8 and 4.4, respectively, achieving 97.3 and 96.8, which indicates that the proposed model is capable of not only enhancing recognition precision but also reducing the likelihood of missed detections, thus maintaining a well-balanced trade-off between precision and recall. The improvements are also evident across different object scales. For small objects, the AP increased by 6.6, reaching 88.7. Although the value remains the lowest among the three different scales, it shows the largest gain. This is mainly attributed to the large number of small objects and the certain visual similarity in appearance features between different categories, thereby increasing the difficulty of detection. Furthermore, the AP of medium-sized objects improved by 2.1, achieving 94.9. Although the improvement was relatively limited, the overall value was the highest among the three different scales. This relatively modest improvement reflects the fact that medium-sized objects are moderate in number and characterized by more distinctive features, making them easier to detect. For large objects, the AP increased by 3.4, reaching 92.4, showing intermediate performance between small and medium objects. Overall, the results confirm that the proposed PCB-Faster-RCNN achieves notable improvements in detection performance across objects of varying scales, especially in small-object detection. The consistent enhancement across multiple metrics demonstrates that our model is well-suited to address the increasing diversity and complexity of PCB defect inspection in modern industrial scenes, highlighting its potential for practical application.

4.5. Comparison Experiments

To comprehensively evaluate the performance of PCB-Faster-RCNN, we select several representative and advanced object detection algorithms for comparison experiments, including multiple versions of the YOLO series, SSD, and RetinaNet. The comparison experiments were conducted under identical experimental environments and dataset conditions to eliminate external variability and ensure fairness and reliability of the results. Table 6 shows the AP values for different types of defects, while Table 7 presents the results of various metrics for PCB-Faster-RCNN under different object detection algorithms.

As illustrated in Table 6, the PCB-Faster-RCNN approach proposed in this paper outperforms current mainstream object detection models in various performance indicators across most categories, fully demonstrating its effectiveness in PCB defect detection tasks. Specifically, the model performs the most outstandingly in identifying defects of FB and FP, with AP values of 97.9 and 96.8, respectively, both exceeding 95. These two features have clear appearance characteristics, so they can achieve high-precision detection. In comparison, the AP values for NP and PN defects are 90.4 and 94.2, respectively. Although slightly lower than FB and FP, these results remain at a high level and still show clear advantages over other mainstream object detection models, which means that PCB-Faster-RCNN maintains robust detection performance even for defects with less distinctive appearances or deformations. For FO, HD, PI, and XO defects, the AP values are 86.4, 87.4, 87.5, and 89.3, respectively, which are relatively lower. The primary reason is that these defect types tend to be small in scale and share partially similar visual patterns, thereby increasing detection difficulty. They also show complex geometries and lack obvious textures. However, even under such challenging conditions, the proposed algorithm consistently achieves higher precision compared with existing mainstream detection methods, reflecting its strong generalization capability and superiority. However, we should also notice that PCB-Faster-RCNN exhibits slightly lower performance than several mainstream detectors in certain categories. For instance, the detection accuracy for the FP class is 86.4, which is 0.9 and 0.5 lower than that of models RetinaNet and YOLOv11, respectively. The possible reason for this phenomenon is that the structural features of FP objects are relatively prominent, while overly strong data augmentation strategies during training can introduce significant appearance perturbations, thereby weakening the stability of the feature patterns of this category and leading to a slight decrease in performance. Nevertheless, this localized performance drop does not materially affect the overall detection capability of the proposed model. Overall, the results presented in Table 6 confirm that PCB-Faster-RCNN not only attains near-optimal performance in detecting defects with clear visual characteristics but also maintains superior accuracy and stability in detecting small-scale and visually ambiguous defects, thereby highlighting its advancement and practical value in PCB defect detection tasks.

As we can see in Table 7, in terms of detection accuracy, the PCB-Faster-RCNN approach proposed in this paper outperforms the comparison model in all key indicators, demonstrating its significant performance advantages. Specifically, the proposed model consistently outperforms across nearly all evaluation metrics, with improvements ranging from approximately 2 to 6 when compared with the last YOLO series algorithm, YOLOv11, which indicates that PCB-Faster-RCNN exhibits enhanced robustness and generalization capability in complex defect detection tasks. Specifically, in the case of small-object detection, the AP value of PCB-Faster-RCNN is slightly lower than its performance on larger objects, reaching 88.7. Nevertheless, this value remains the highest among all other object detection algorithms, suggesting that the proposed model retains a considerable advantage in identifying small-scale defects, particularly in scenes where object features are subtle, boundaries are indistinct, or background noise is substantial. For medium-scale and large-scale object detection, the improvement in AP values is relatively limited, within the range of 0 to 2, This can be attributed to the fact that medium and large objects generally present more distinct and easily recognizable features in images, enabling most algorithms to achieve stable and accurate detection. Despite the limited relative improvement, PCB-Faster-RCNN still achieves the highest AP values in these categories, underscoring its superior capability across multiple object scales.

However, in terms of computational performance, the proposed PCB-Faster-RCNN in this paper is an extension of the two-stage detection framework and incorporates multiple additional modules designed to adapt to the complexity and variability of PCB manufacturing scenes, making its computational complexity inherently higher compared to single-stage detection algorithms. From the perspective of GFLOPs, the model demonstrates a significantly greater computational load, with its value exceeding that of the last YOLOv11 by approximately 20, which indicates that the number of floating-point operations required for a single forward pass is substantially increased, enhancing detection accuracy and robustness while inevitably increasing overall model complexity. Nevertheless, this increase in computational demand represents a deliberate trade-off, aimed at achieving more precise identification of small-scale and complex defects on PCB surfaces, which is an essential requirement in high-precision detection tasks. Furthermore, it should be noted that modern industrial environments are generally equipped with advanced hardware resources, such as high-performance GPUs and large-scale parallel computing infrastructures, which are fully capable of handling this additional computational burden. Therefore, although PCB-Faster-RCNN exhibits higher complexity compared to certain mainstream models, the performance gains in detection accuracy substantially outweigh the computational costs. The increase in GFLOPs remains within an acceptable range, ensuring that the model can effectively satisfy the stringent demands of real-world PCB production processes, where both precision and reliability are of critical importance.

Overall, the experimental results demonstrate that PCB-Faster-RCNN achieves a certain degree of improvement in detection accuracy compared to baseline models and other mainstream object detection models. Although the increase in modules leads to increased computational complexity, this increase is within an acceptable range in actual industrial production. It achieves a state-of-the-art accuracy–complexity trade-off. To more intuitively demonstrate the performance of the model, we created histograms of different object detection algorithms under various evaluation metrics, shown in Figure 7 and Figure 8, respectively.

Figure 7 and Figure 8 visually illustrate the performance of PCB-Faster-RCNN across multiple evaluation metrics, as well as its effectiveness in practical detection tasks. It can be clearly observed that PCB-Faster-RCNN outperforms other conventional and mainstream object detection algorithms in key metrics, demonstrating its exceptional capability in detecting surface defects on PCBs. PCB-Faster-RCNN exhibits notable advantages in both detection precision and overall performance, confirming its feasibility and practical value in industrial production environments and providing critical technical support for efficient automated PCB quality inspection.

To increase statistical rigor, we performed five independent experiments per model and report mean, sample standard deviation, and 95% confidence intervals to assess whether the observed differences under identical random seeds are statistically significant. The results of each independent experiment are shown in Table 8, and the results of experiments regarding the mAP are shown in Table 9.

As shown in Table 8 and Table 9, PCB-Faster-RCNN achieves not only the highest mAP but also the narrowest 95% confidence interval and the smallest standard deviation among all models. This indicates robustness and stability across repeated trials. The improvement is attributed to the synergistic integration of enhanced architectural components—such as ResNet-101, deformable convolutions, and the CBAM attention mechanism, which collectively strengthen feature representation and improve generalization consistency. Therefore, the observed performance gains of PCB-Faster-RCNN are statistically reliable rather than incidental. Figure 9 presents the confusion matrix.

To further analyze our method, we compared our model with hybrid backbones and explainable modules outside the PCB literature [49,50]. Shravya [49] proposed a novel approach that aims to enhance the interpretability of deep learning models used for lung cancer diagnosis. The key innovation is integrating explanation supervision directly into the training process, which ensures that attention maps are consistent across samples of the same class while promoting divergence between different classes. We compared the Contrastive Interpretability Method (CEL) with the CBAM. The difference between CEL and CBAM is that CEL promotes the model’s discriminativity by enforcing attention consistency within the same class and inconsistency between different classes, while CBAM simply provides attention maps without these constraints. Therefore, CEL should achieve more consistent attention maps and be more discriminative than CBAM. Anand [50] proposed a novel skin disease classification framework, CHASHNIt (Combined Hybrid Architecture for Scalable High-Performance in Neural Iterations), which significantly improves the accuracy of skin disease classification by integrating three deep learning models: EfficientNetB7, DenseNet201, and InceptionResNetV2.

4.6. Ablation Experiments

To systematically and comprehensively assess the contribution of individual components to the overall performance of PCB-Faster-RCNN, a series of ablation experiments were conducted by adding or removing specific modules progressively. We employed AP50, AP75, mAP, and GFLOPs as primary evaluation metrics to rigorously quantify the trade-off between detection accuracy and computational cost. Specifically, six distinct improvement schemes were designed. Improvement 1 to improvement 3 involved adding a single module into the baseline model to evaluate the performance gains conferred by each additional module, including ResNet-101 backbone, deformable convolutions, and the CBAM. Conversely, improvement 4 to improvement 6 entailed the removal of corresponding modules to analyze their necessity and contribution to the overall model. All variants were thoroughly validated through controlled experiments, with the results summarized in Table 10.

As shown in Table 8, introducing a single module in improvement 1 to improvement 3 did not yield a significant improvement in the overall performance of the model. Sometimes, it even led to a degradation in detection accuracy. This phenomenon suggests that the isolated incorporation of a component may disrupt the balance of feature extraction, resulting in overfitting or the loss of critical information, thereby negatively affecting the overall performance. For instance, improvement 2 integrated the deformable convolution independently, and its AP50, AP75, and mAP decreased by 0.6, 0.4, and 0.8, respectively. The experimental results indicate that incorporating the DCN module alone did not improve model performance and may even lead to a performance decline. We believe the underlying reasons for this performance degradation can be summarized in two aspects. Firstly, DCN is not universally effective across all scenarios. Its effective operation relies on deeper and more stable feature representations provided by other architectural modules. DCN is prone to overfitting or degradation without sufficient high-quality feature support. Secondly, most defect instances in the dataset used in this paper exhibit relatively regular shapes, which limits DCN’s ability to exploit its inherent advantage in modeling complex geometric deformations, thus further increasing the risk of overfitting. In contrast, in improvement 6 (DCN + ResNet), the residual connections in ResNet-101 effectively mitigate gradient degradation and suppress overfitting, enabling the model to stably train deeper structures and to learn more complex and abstract feature representations. This provides a more reliable feature foundation for deformation modeling of DCN, allowing the two components to work synergistically and significantly enhance overall model performance. The experimental results demonstrate that this combination achieves a 1.5 increase in mAP over the baseline. A similar pattern is observed in improvement 3 (CBAM) and improvement 5 (CBAM + ResNet): integrating the CBAM alone only generates a 0.8 improvement, whereas its combination with ResNet achieves a substantial a 3.4 increase in mAP, significantly enhancing model performance. This improvement can likewise be attributed to the ability of ResNet to learn deeper and more expressive feature representations, which further strengthens the effectiveness of the attention mechanism. Furthermore, the experimental results in improvement 4 indicate that incorporating DCN and CBAM yielded only marginal improvements, with the AP50, AP75, and mAP increasing by 0.2, 0.3, and 0.2, respectively. This suggests that, although both modules possess strong capabilities in feature extraction and representation, their high degrees of freedom make them prone to overfitting, particularly in datasets with limited scale or relatively regular object patterns, thereby constraining their standalone effectiveness. The overall findings indicate that performance enhancement cannot be achieved through the mere addition of individual modules; it requires a task-specific and data-aware configuration and optimization of multiple components. The integration of different modules can effectively leverage their complementary strengths. ResNet-101 provides deeper and more abstract feature representations, while DCN enhances the modeling of geometric deformations, and CBAM refines the feature selection process. The synergistic interaction among the three components not only balances feature extraction and generalization but also significantly improves overall detection performance, underscoring the critical importance of module coupling.

In terms of computational performance, although the integration of new components inevitably increases computational complexity and results in a rise in GFLOPs compared with the baseline, the growth remains within a manageable range and does not impose a substantial burden on inference efficiency, which demonstrates that the proposed model achieves a favorable balance between detection accuracy and computational cost, ensuring that real-time and stability requirements in practical PCB production environments are fully satisfied. Therefore, these findings further validate the effectiveness and applicability of PCB-Faster-RCNN in dynamic and complex industrial scenes.

4.7. The Effect of Detection

In this subsection, we will present the detection performance of PCB-Faster-RCNN across different defect categories and their spatial locations, with the aim of providing a comprehensive evaluation of its effectiveness and applicability in practical scenarios. The overall detection results are illustrated in Figure 10, offering a clear visualization of the capability of the model in recognizing and localizing multiple types of defects on PCB surfaces.

As illustrated in Figure 10, the proposed PCB-Faster-RCNN in this paper demonstrates superior performance across various types of defect detection tasks. Specifically, for small-scale defects such as HD, XO, and PN, the model can effectively identify and precisely localize the objects. Moreover, the model can maintain strong discriminative capability when confronted with highly similar defects, such as NP and PI, thereby minimizing the risks of false positives and missed detections caused by feature overlap. Moreover, for defect categories with distinctive texture or structural patterns, including FB, FO, and FP, the detection outcomes exhibit remarkable robustness. These results collectively validate the applicability and stability of PCB-Faster-RCNN in complex industrial scenes. More importantly, the framework enables highly accurate detection of diverse defects in real PCB manufacturing processes, which significantly reduces the likelihood of product performance degradation and reliability issues arising from unnoticed defects. This highlights not only the technical advantages of our approach but also its practical significance and potential for large-scale deployment in modern electronic manufacturing industries.

Although the proposed method achieves satisfactory overall detection performance, it is important to acknowledge that it is not entirely perfect. In practical detection scenarios, the model may still produce misclassifications or missed detections, as illustrated in Figure 11.

As illustrated in Figure 11, the model incorrectly detected FB as FP, as well as failing to detect certain extremely small defects. These issues can be attributed to two primary factors: (1) some FP and FB defects share highly similar visual characteristics, making feature discrimination challenging for the model; and (2) the features of very small defects are often weakened in deeper feature maps, leading to missed detections. In future work, we aim to further refine the model architecture and enhance feature extraction mechanisms to improve its robustness and accuracy in fine-grained discrimination and small-object detection.

5. Conclusions

In this paper, we address the critical task of surface defect detection in PCBs within real industrial manufacturing scenes and propose an improved object detection model based on Faster-RCNN, called PCB-Faster-RCNN, to enhance the accuracy and reliability of defect identification. Firstly, we replace the traditional VGG-16 backbone with a deeper residual network, ResNet-101, which not only provides stronger feature representation capacity but also alleviates the problems of gradient vanishing and exploding in deep networks. Then, deformable convolution is integrated into the network to enable convolutional kernels to adaptively adjust their sampling positions, thereby improving the ability of the model to capture irregular shapes and geometrically distorted defect regions. Finally, the CBAM is embedded in the backbone network to optimize the discriminative power of feature representations by jointly modeling channel attention and spatial attention. The results demonstrate that PCB-Faster-RCNN outperforms baseline models and mainstream detectors in various metrics, validating the effectiveness and superiority of the proposed improvements. The method proposed in this paper can improve percentage-based results so that people who are not familiar with mathematics can more easily understand its effects. However, PCB-Faster-RCNN still faces certain limitations. For instance, its performance in detecting blurred or low-contrast defect regions remains to be further improved, and the current dataset lacks sufficient samples captured under extreme conditions, which constrains the generalization ability of the model in such scenes. Furthermore, due to the production confidentiality and equipment access restrictions of our partner companies, we are currently unable to validate our model across production lines, which further restricts the generalization ability of our model. In future work, we will continue to strengthen our collaboration with partner companies, focusing on constructing more refined and diverse datasets while continuously improving our methods to enhance their generalization capabilities across different scenarios and production lines. In summary, PCB-Faster-RCNN provides a novel and effective solution for PCB surface defect detection while also opening new research directions for intelligent visual inspection in industrial applications. Its introduction lays a solid foundation for further technological innovations and practical advancements in the field.

Author Contributions

Conceptualization, Z.H. and Y.W.; methodology, Z.H.; software, Z.H.; validation, Z.H., Y.W. and Y.H.; formal analysis, Z.H. and Y.W.; investigation, Z.H. and Y.L.; resources, Y.W. and Y.H.; data curation, Z.H. and Y.W.; writing—original draft preparation, Z.H.; writing—review and editing, Z.H., Y.W. and Y.H.; visualization, Z.H.; supervision, Y.L.; project administration, Y.H.; funding acquisition, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Science and Technology Plan Project of Sichuan Province (program no. 2022YFG0027); in part by the Fundamental Research Funds for the Central Universities and the Funds for CAAC Key Laboratory of Flight Techniques and Flight Safety (program no. FZ2025ZX34); in part by the Civil Aviation Information Technology Research Center, Civil Aviation Flight University of China (program no. 25CAFUC09010); in part by the Henan Province Key R&D Special Fund (program no. 251111242100); and in part by the National Key R&D Program of China (program no. 2021YFF0603904).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available because the data involves company secrets; it is not convenient to provide the original data at this time. Only a portion of the data may be presented in the article.

Acknowledgments

We sincerely thank the authors of YOLO, Res-Net, CBAM, DCN, and CIoU for providing their algorithms to facilitate the comparative and ablation experiments.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PCB	Printed Circuit Board
YOLO	You Only Look Once
CNN	Convolutional Neural Network
RPN	Region Proposal Network
CBAM	Convolutional Block Attention Module
CAM	Channel Attention Module
SAM	Spatial Attention Module
DCN	Deformable Convolution Network
IoU	Intersection Over Union
Adma	Adaptive Moment Estimation
AP	Average Precision
mAP	Mean Average Precision
GFLOPs	Giga Floating-Point Operations Per Second

References

Ling, Q.; Isa, N.A.M. Printed Circuit Board Defect Detection Methods Based on Image Processing, Machine Learning and Deep Learning: A Survey. IEEE Access 2023, 11, 15921–15944. [Google Scholar] [CrossRef]
Islam, M.R.; Zamil, M.Z.H.; Rayed, M.E.; Kabir, M.M.; Mridha, M.; Nishimura, S.; Shin, J. Deep Learning and Computer Vision Techniques for Enhanced Quality Control in Manufacturing Processes. IEEE Access 2024, 12, 121449–121479. [Google Scholar] [CrossRef]
Park, J.H.; Kim, Y.S.; Seo, H.; Cho, Y.J. Analysis of Training Deep Learning Models for PCB Defect Detection. Sensors 2023, 23, 2766. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Wu, Y.; He, X.; Ming, W. A Comprehensive Review of Deep Learning-Based PCB Defect Detection. IEEE Access 2023, 11, 139017–139038. [Google Scholar] [CrossRef]
Aggarwal, N.; Deshwal, M.; Samant, P. A survey on automatic printed circuit board defect detection techniques. In Proceedings of the 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 28–29 April 2022; pp. 853–856. [Google Scholar]
Adibhatla, V.A.; Chih, H.-C.; Hsu, C.-C.; Cheng, J.; Abbod, M.F.; Shieh, J.-S. Applying deep learning to defect detection in printed circuit boards via a newest model of you-only-look-once. Math. Biosci. Eng. 2021, 18, 4411–4428. [Google Scholar] [CrossRef]
Zhou, Y.; Yuan, M.; Zhang, J.; Ding, G.; Qin, S. Review of vision-based defect detection research and its perspectives for printed circuit board. J. Manuf. Syst. 2023, 70, 557–578. [Google Scholar] [CrossRef]
Wang, J.; Xie, X.; Liu, G.; Wu, L. A Lightweight PCB Defect Detection Algorithm Based on Improved YOLOv8-PCB. Symmetry 2025, 17, 309. [Google Scholar] [CrossRef]
Melnyk, R.A.; Tushnytskyy, R.B. Detection of defects in printed circuit boards by clustering the etalon and defected samples. In Proceedings of the 2020 IEEE 15th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET), Lviv-Slavske, Ukraine, 25–29 February 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
Chen, W.; Huang, Z.; Mu, Q.; Sun, Y. PCB Defect Detection Method Based on Transformer-YOLO. IEEE Access 2022, 10, 129480–129489. [Google Scholar] [CrossRef]
Yang, Y.; Kang, H. An Enhanced Detection Method of PCB Defect Based on Improved YOLOv7. Electronics 2023, 12, 2120. [Google Scholar] [CrossRef]
Xin, H.; Chen, Z.; Wang, B. PCB electronic component defect detection method based on improved YOLOv4 algorithm. J. Phys. Conf. Ser. 2021, 1827, 012167. [Google Scholar] [CrossRef]
He, Z.; He, Y. AS-Faster-RCNN: An Improved Object Detection Algorithm for Airport Scene Based on Faster R-CNN. IEEE Access 2025, 13, 36050–36064. [Google Scholar] [CrossRef]
He, Z.; He, Y.; Lv, Y. DT-YOLO: An Improved Object Detection Algorithm for Key Components of Aircraft and Staff in Airport Scenes Based on YOLOv5. Sensors 2025, 25, 1705. [Google Scholar] [CrossRef] [PubMed]
Pham, T.T.A.; Thoi, D.K.T.; Choi, H.; Park, S. Defect detection in printed circuit boards using semi-supervised learning. Sensors 2023, 23, 3246. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; Meng, S.; Wang, X. Local and Global Context-Enhanced Lightweight CenterNet for PCB Surface Defect Detection. Sensors 2024, 24, 4729. [Google Scholar] [CrossRef]
Viola, P.; Jones, M. Robust Real-Time Object Detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; IEEE: Piscataway, NJ, USA, 2005; Volume 1, pp. 886–893. [Google Scholar]
Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef]
Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1627–1645. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; Volume 7, pp. 580–587. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; Volume 4, pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
He, K.; Georgia, G. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; Volume 3, pp. 1026–1034. [Google Scholar]
Pang, J.; Chen, K.; Shi, J. Libra R-CNN: Towards Balanced Learning for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; Volume 1, pp. 821–830. [Google Scholar]
Cai, Z.; Nuno, V. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; Volume 6, pp. 6154–6162. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR) 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. Comput. Vis. Pattern Recognit. 2018, 1804, 1–6. [Google Scholar]
Bochkovskiy, A.; Wangm, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal Speed and Accuracy of Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
Jocher, G. YOLOv5 by Ultralytics. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 28 February 2023).
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. Yolov6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Sohan, M.; Sai Ram, T.; Reddy, R.; Venkata, C. A Review on YOLOv8 and Its Advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 27–28 June 2023; pp. 529–545. [Google Scholar]
Zhang, C.; Shi, W.; Li, X.; Zhang, H.; Liu, H. Improved bare PCB defect detection approach based on deep feature learning. J. Eng. 2018, 16, 1415–1420. [Google Scholar] [CrossRef]
Yuan, M.; Zhou, Y.; Ren, X.; Zhi, H.; Zhang, J.; Chen, H. YOLO-HMC: An Improved Method for PCB Surface Defect Detection. IEEE Trans. Instrum. Meas. 2024, 73, 2001611. [Google Scholar] [CrossRef]
Nguyen, V.T.; Bui, H.A. A real-time defect detection in printed circuit boards applying deep learning. EUREKA Phys. Eng. 2022, 2, 143–153. [Google Scholar] [CrossRef]
Guo, H.; Zhao, H.; Zhao, Y.; Liu, W. PCB defect detection algorithm based on deep learning. Optik 2024, 315, 172036. [Google Scholar] [CrossRef]
Yuan, Z.; Tang, X.; Ning, H.; Yang, Z. LW-YOLO: Lightweight Deep Learning Model for Fast and Precise Defect Detection in Printed Circuit Boards. Symmetry 2024, 16, 418. [Google Scholar] [CrossRef]
Chen, I.-C.; Hwang, R.-C.; Huang, H.-C. Pcb defect detection based on deep learning algorithm. Processes 2023, 11, 775. [Google Scholar] [CrossRef]
Kim, J.; Ko, J.; Choi, H.; Kim, H. Printed circuit board defect detection using deep learning via a skip-connected convolutional autoencoder. Sensors 2021, 21, 4968. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 30 June 2016. [Google Scholar]
Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable ConvNets V2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9308–9316. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 16965–16974. Available online: https://openaccess.thecvf.com/content/CVPR2024/html/Zhao_DETRs_Beat_YOLOs_on_Real-time_Object_Detection_CVPR_2024_paper.html (accessed on 25 September 2025).
Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.Y. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
Shravya, V.; Sunil, M.; Natarajan, B.; Elakkiya, R. Encouraging Discriminative Attention Through Contrastive Explainability Learning for Lung Cancer Diagnosis. IEEE Access 2025, 13, 176958–176976. [Google Scholar] [CrossRef]
Anand, S.; Sharma, A.; Natarajan, B.; Slathia, A.S.; Rathi, A.; Behara, K.P.; Elakkiya, R. CHASHNIt for enhancing skin disease classification using GAN augmented hybrid model with LIME and SHAP based XAI heatmaps. Sci. Rep. 2025, 15, 31138. [Google Scholar] [CrossRef]

Figure 1. The overall architecture of PCB-Faster-RCNN.

Figure 2. The overall structure of ResNet-101.

Figure 3. The processing of the image with the feature pyramid network.

Figure 4. An example of deformable convolution. Green dots represent sampling points of traditional convolution, blue dots represent sampling points of deformable convolution and arrows represent offsets and offset directions.

Figure 5. The overall structure of CBAM.

Figure 6. (a) The defect of film anomaly. (b) The defect of film oxidation. (c) The defect of film peeling. (d) The defect of hole anomaly. (e) The defect of breakage. (f) The defect of foreign matter on the film. (g) The defect of sub-film foreign matter. (h) The defect of oil contamination.

Figure 7. The histograms of various AP metrics for different models.

Figure 8. The histograms of the GFLOPs for different models.

Figure 9. The confusion matrix.

Figure 10. The overall detection performance of PCB-Faster-RCNN.

Figure 11. Partial data was detected incorrectly.

Table 1. The details of the industrial cameras.

Parameter	Value
Resolution	2456 × 2058
Illumination	LED Ring Light
Annotation Protocol	LabelImg
Exposure	1 ms
Distance	25 cm
Focal length	30 mm
Aperture	F5

Table 2. The details of each defect category.

Code Name	Defect Name	Numbers
FB	film anomaly	1227
FO	film oxidation	1108
FP	film peeling	1106
HD	hole anomaly	1106
NP	breakage	1206
PN	foreign matter on the film	1132
PI	sub-film foreign matter	1152
XO	oil contamination	1065

Table 3. The parameters of the experimental environment.

Parameter	Value
Operating System	Ubuntu 22.4
Programming Language	Python 3.11.9
Deep Learning Framework	Pytorch 2.4.1/Cuda 12.4
GPU	NVIDIA GeForce 4090@24G
CPU	Intel(R) Core(TM) i9-13900K@3.00 GHz
Memory	64 G

Table 4. The values of the hyperparameters.

Parameter	Value
Initial Learning Rate	0.01
Momentum Factor	0.937
Weight Attenuation Coefficient	0.0005
IoU Training Threshold	0.2
Batch Size	64
Decay Rate $β_{1}$	0.9
Decay Rate $β_{2}$	0.999
Numerical Stability Constant	$10^{- 8}$
Optimizer	Adam

Table 5. The comparison of experimental results between baseline and PCB-Faster-RCNN.

Model	P	R	APs	APm	APl	mAP
Baseline	93.5	92.4	82.1	92.8	88.9	87.9
PCB-Faster-Rcnn	97.3	96.8	88.7	94.9	92.3	92.4

Table 6. The AP values for different types of defects.

	FB	FO	FP	HD	NP	PN	PI	XO
SSD	85.6	73.2	86.2	72.9	73.7	86.1	73.8	71.2
YOLOv5	87.5	78.9	90.2	79.8	78.6	88.4	76.1	75.4
YOLOv6	88.7	80.3	91.4	78.5	80.2	87.2	78.3	76.2
YOLOv7	89.5	82.4	92.3	79.7	82.1	89.3	79.6	77.5
YOLOv8	90.8	81.9	93.4	78.9	83.7	90.6	80.9	79.3
YOLOv9	91.6	82.7	92.7	80.6	84.3	92.4	82.6	81.1
YOLOv10	93.4	84.3	94.5	83.1	86.4	92.1	84.8	83.6
YOLOv11 [46]	95.8	86.9	94.9	84.2	85.6	91.9	86.4	86.2
RetinaNet	96.7	87.3	95.4	86.8	87.9	92.6	87.3	87.8
RT-DETR [47]	96.8	86.4	95.6	86.4	87.1	93.1	86.2	88.1
DINO [48]	97.2	85.3	95.9	87.1	89.2	93.6	86.1	87.6
PCB-Faster-RCNN	97.9	86.4	96.8	87.4	90.4	94.2	87.5	89.3

Table 7. The results of comparison experiments.

Model	AP50	AP75	APs	APm	APl	mAP	GFLOPs
SSD	85.9	68.1	73.6	86.7	86.3	82.2	190.6
YOLOv5	87.3	70.2	74.9	88.1	87.8	83.6	206.4
YOLOv6	88.2	72.3	74.8	89.3	89.2	84.4	210.1
YOLOv7	90.1	73.1	75.5	90.1	88.9	84.8	216.5
YOLOv8	91.4	75.4	76.9	89.8	90.4	85.7	225.7
YOLOv9	92.6	74.2	76.3	91.6	91.3	86.4	231.9
YOLOv10	94.7	77.5	80.8	93.5	90.8	88.4	233.6
YOLOv11 [46]	96.4	78.9	82.9	92.8	91.6	89.1	238.1
RetinaNet	97.8	80.2	85.3	93.4	91.9	90.2	250.3
RT-DETR [47]	98.4	82.1	86.7	93.6	91.6	90.6	257.6
DINO [48]	98.3	82.5	87.8	94.2	92.1	91.4	258.4
PCB-Faster-RCNN	98.7	83.6	88.7	94.9	92.3	92.4	260.2

Table 8. The mAP for each model in 5 independent experiments.

Model	Experiment1	Experiment2	Experiment3	Experiment4	Experiment5
SSD	82.2	81.6	82.7	82.4	81.8
YOLOv5	83.6	84.1	83.1	84.5	82.8
YOLOv6	84.4	84.6	85.1	83.9	84.2
YOLOv7	84.8	84.1	85.5	85.7	84.2
YOLOv8	85.7	86.4	85.1	85.2	86.4
YOLOv9	86.4	86.7	85.7	85.8	87.4
YOLOv10	88.4	89.2	87.4	88.1	89.3
YOLOv11	89.1	90.2	90.4	88.2	88.4
RetinaNet	90.2	91.1	90.9	89.3	89.6
RT-DETR [47]	90.6	91.2	90.1	90.5	91.6
DINO [48]	91.4	90.9	90.5	91.7	92.1
PCB-Faster-RCNN	92.4	92.6	91.9	91.5	92.8

Table 9. The mAP results with mean, sample standard deviation, and 95% confidence intervals.

Model	mAP	Std	95% CI
SSD	82.14	0.429	[81.81, 82.47]
YOLOv5	83.62	0.646	[83.19, 84.05]
YOLOv6	84.44	0.424	[84.11, 84.77]
YOLOv7	84.86	0.601	[84.46, 85.26]
YOLOv8	85.76	0.575	[85.38, 86.14]
YOLOv9	86.40	0.632	[85.98, 86.82]
YOLOv10	88.48	0.734	[87.99, 88.97]
YOLOv11	89.26	0.918	[88.63, 89.89]
RetinaNet	90.22	0.701	[89.75, 90.69]
RT-DETR [47]	90.8	0.596	[90.06, 91.54]
DINO [48]	91.3	0.634	[90.53, 92.11]
PCB-Faster-RCNN	92.2	0.501	[91.93, 92.55]

Table 10. The results of ablation experiments.

Model	ResNet	DCN	CBAM	AP50	AP75	mAP	GFLOPs
Baseline				94.3	73.6	87.9	240.7
Improvement 1	✔			95.8	76.3	89.2	253.9
Improvement 2		✔		93.7	73.2	87.1	248.3
Improvement 3			✔	95.2	75.9	88.7	250.4
Improvement 4		✔	✔	94.5	73.9	88.1	253.4
Improvement 5	✔		✔	97.8	79.1	91.3	257.8
Improvement 6	✔	✔		97.1	78.5	89.4	256.3
PCB-Faster-RCNN	✔	✔	✔	98.7	83.6	92.4	260.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, Z.; Wu, Y.; Lv, Y.; He, Y. PCB-Faster-RCNN: An Improved Object Detection Algorithm for PCB Surface Defects. Appl. Sci. 2025, 15, 12881. https://doi.org/10.3390/app152412881

AMA Style

He Z, Wu Y, Lv Y, He Y. PCB-Faster-RCNN: An Improved Object Detection Algorithm for PCB Surface Defects. Applied Sciences. 2025; 15(24):12881. https://doi.org/10.3390/app152412881

Chicago/Turabian Style

He, Zhige, Yuezhou Wu, Yang Lv, and Yuanqing He. 2025. "PCB-Faster-RCNN: An Improved Object Detection Algorithm for PCB Surface Defects" Applied Sciences 15, no. 24: 12881. https://doi.org/10.3390/app152412881

APA Style

He, Z., Wu, Y., Lv, Y., & He, Y. (2025). PCB-Faster-RCNN: An Improved Object Detection Algorithm for PCB Surface Defects. Applied Sciences, 15(24), 12881. https://doi.org/10.3390/app152412881

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PCB-Faster-RCNN: An Improved Object Detection Algorithm for PCB Surface Defects

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. The Overview of PCB-Faster-RCNN

3.2. Residual Neural Network

3.3. Deformable Convolution Network

3.4. Convolutional Block Attention Module

3.5. CIoU Loss Function

4. Results

4.1. Dataset

4.2. The Implementation Details

4.3. The Evaluation Metrics

4.4. The Result of the Experiment Compared with the Baseline Model

4.5. Comparison Experiments

4.6. Ablation Experiments

4.7. The Effect of Detection

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI