The PCB dataset from Kaggle [
27] contains 1386 images of PCBs with several types of defects, such as MHs (115 Images), MBs (115 Images), OCs (116 Images), Shorts (116 Images), Spurs (115 Images), and SCs (116 Images), as shown in
Figure 3. These images form the basis of training and validation for the Faster R-CNN model with a backbone of ResNet-50 in defect detection [
28]. The main purpose of the PCB defect detection system is to conduct complete and accurate defect detection in Printed Circuit Boards by using an advanced deep learning methodology. In this proposed system, a detection network called Faster R-CNN is trained to identify six different classes of defects. This involves selecting the right model with the right layer configuration for the proposed project, building an algorithm for detection, training the model, and optimizing performance parameters in defect detection, either in real time or near-to-real time.
3.1. Experimental Setup
Data Acquisition—The dataset is structured in such a way that every image can be linked to an XML file. These XML files include labels and bounding box coordinates, which specify the location and type of each defect, such as “Missing_hole” or “Mouse_bite”. This information is key in training a model to be able to detect both the position and class of each defect [
29]. Each image
I in the dataset is associated with a set of defects [
30], each represented by a bounding box
Bi (1), where
xmin,
ymin are the coordinates of the top-left corner of the bounding box.
xmax,
ymax are the coordinates of the bottom-right corner.
C represents the class label of the defect.
Given the XML annotation files, the image dimensions
W and
H are extracted, such that
For each detected defect, the normalized bounding box [
31] coordinates can be computed as shown in Equation (3):
where
x~,
y~ represent the normalized coordinates, ensuring that all bounding boxes are within the range [0, 1]. Normalization is essential for ensuring scale invariance and stable convergence during model training. Without normalization, bounding box coordinates vary significantly based on image resolution, causing the model to learn absolute pixel values, rather than relative positions. This can lead to unstable optimization due to larger gradient updates. By scaling bounding box values between 0 and 1, normalization ensures compatibility across different input resolutions, which is crucial for models like Faster R-CNN with a ResNet-50 backbone. Without normalization, the model may struggle to generalize, as bounding box values become resolution-dependent. With normalization, the model learns spatial relationships independently of input size, improving robustness and performance across PCB images of varying resolutions.
The width and height of each bounding box can be derived as shown in Equation (4):
The dataset
D can then be expressed as a collection of labelled bounding boxes (5):
where
N is the total number of images, and each image I
k contains a set of n defects.
- 2.
Parsing Annotations—Each of the XML files is parsed to extract the type of defective label, along with the bounding box coordinates. For instance, there are XML structure fields,
xmin,
ymin,
xmax, and
ymax, that define the corners of each bounding box. These are compatible with machine learning frameworks [
32]. Parsing annotation involves reading annotation files and extracting required information, such as the file path, coordinates, and class labels. Then, transformations are used to convert the raw annotation data into the required format for Faster R-CNN implementation in PyTorch 1.12.1. For example, bounding boxes are converted to tensors. Class labels are mapped to integer symbols. In the handling of edge cases, robustness is ensured by addressing incomplete or inaccurate annotations, such as missing bounding boxes or out-of-bound coordinates. For reading annotation files, each annotation file A
k corresponds to an image
Ik and contains structured data, including bounding box coordinates and class labels [
33]. The annotation file provides the following information:
where
Ci is the class label of
I, the defect in image
Ik; (
xmin,
I,
ymin,
I) and (
xmax,
i,
ymax,
i) are the bounding box coordinates; and
nk represents the number of defects in image
Ik. The bounding box coordinates are converted into tensor format, as follows:
where
Bi is the bounding box tensor. The class labels are mapped to integer representations [
34].
This format facilitates easy analysis, transformations, and splitting into training and testing datasets for effective model evaluation. The dataset is split in an 80-20 ratio, with 80% of the images allocated to the training set and 20% to the test set. This split ensures that the model is trained on a significant portion of the data, while being evaluated based on unseen data to assess its performance.
3.2. Data Augmentation and Transformation
The mathematical representation of data augmentation is as follows. Let I be an input image with dimensions (W, H), (8) where W is the width and H is the height [
35]. A set of augmentation functions T is applied to I, generating a transformed image I:
where T is a combination of one or more transformations from the set {T1, T2, …, Tn}. Data augmentation artificially inflates the size of a dataset by applying random transformations to the images. This helps to avoid overfitting of the network [
36].
Rotation—Rotating the image by an angle θ\(in degrees) transforms the pixel coordinates (x, y) to new coordinates [
37].
Scaling—According to Equation (10), resizing the image by a scaling factor
s modifies the pixel coordinates (x,y) transferred to (x′,y′) [
38], such that
Translation: The image is shifted by (Δx, Δy) [
39]:
Flipping—Horizontal or vertical flipping of an image negates its respective coordinate [
40]:
These are rotations, flips, and scaling, examples of transformations that are applied to images. The changes simulate real-world scenarios, such as cases where a PCB is likely to be viewed at different angles or under various light conditions. Through the model, we learned to enable the recognition of defects from different points of view by diversifying the dataset perspectives.
Tensor Conversion—The images and bounding box coordinates are converted into tensors, a format required by PyTorch for the model input. Tensors are optimized to run matrix operations on the GPU, which is suitable for training deep learning models efficiently. Image-to-Tensor Conversion: Each image I is originally a matrix of pixel values with dimensions (H, W, C), where H is height, W is width, and C represents colour channels. It is converted into a PyTorch tensor, as follows [
41]:
where the pixel values are normalized between 0 and 1 for stability in training.
3.3. Model Definition (Faster R-CNN)
There are two main variants of Faster R-CNN. A Region Proposal Network is shown in
Figure 4, which suggests areas in the image where objects might exist. It is used to locate regions and a classifier to label those regions. Understanding Faster R-CNN—Faster R-CNN is a two-stage object detector. The first stage uses a Region Proposal Network (RPN) to propose regions in the image that are likely to contain objects. The second stage processes these proposals, classifying each region and refining the bounding box coordinates for accurate localization. The RPN is trained using Anchor classification.
For each anchor ai, the model calculates the probability of each class c (including the background class), where
is the predicted probability of anchor ai being in class c, and
is the ground truth label for anchor ai for class c (either 0 or 1) [
42].
Pre-trained Model Initialization—A Faster R-CNN model pre-trained on the dataset is used for time efficiency and improved performance. The pre-trained model is shown in
Figure 5. It is a strong base, since it has already learned essential image features like edges, textures, and shapes, which are common across different tasks.
The following Algorithm 1 augmentation pipeline enhances the robustness of the model by applying various transformations to images and adjusting their bounding box coordinates accordingly.
Algorithm 1: Data Augmentation |
step 1 | Random Rotation. Rotate the image by an angle θ. Adjust bounding box coordinates using rotation transformation. |
step 2 | Flipping. Perform a horizontal or vertical flip. Modify bounding box coordinates accordingly. |
step 3 | Resizing. Resize Scaling & the image by a scaling factor. Adjust bounding box positions proportionally. |
step 4 | Visualization Augmentation. |
step 5 | Label Encoding.—The defect class labels are converted to numerical representations using a LabelEncoder: L = label_encoder. transform (C), where C is the categorical label set. |
The following training pipeline in Algorithm 2 outlines the key steps for training a deep learning model efficiently.
Algorithm 2: A Robust Training Framework for High-Accuracy PCB Defect Detection Using Faster R-CNN and ResNet-50 |
Input: |
Training dataset Dtrain (80% of PCB images) |
Validation dataset Dval (20% of PCB images) |
Pretrained Faster R-CNN (ResNet-50 backbone) |
Hyperparameters: lr = 0.0001, batch_size = 8, epochs = N |
Output: |
Trained model M∗ with optimized weights |
Training/validation metrics (loss, mAP, IoU) |
1: Initialize M←FasterRCNN_ResNet50() |
2: Configure Adam optimizer (weight decay = 0.0005) |
3: Set StepLR scheduler SS (step size = 3, γ = 0.1) |
4: Allocate device: GPU (Tesla T4) if available, else CPU |
5: for epoch ∈{1,…,N}do |
6: M.train()M.train() ->Switch to training mode |
7: for batch (Xb,yb)∈DataLoader(Dtrain) do |
8: Xb,yb←Xb.to(device),yb.to(device) |
9: L←M(Xb,yb) ->Forward pass |
10: O.zero_grad() |
11: L.backward() ->Backpropagate |
12: O.step() ->Update weights |
13: end for |
14: M.eval() ->Switch to evaluation mode |
15: for batch (Xb,yb)∈DataLoader(Dval) do |
16: IoU←box_iou(M(Xb),yb) -> Localization accuracy |
17: mAP←compute_mAP(M(Xb),yb) |
18: end for |
19: S.step() -> Adjust learning rate |
20: Save M∗ if val_loss improves |
21: end for |
22: return M∗, metrics |
This structured training procedure ensures that the model converges effectively while maintaining robustness in defect detection. The combination of data augmentation and optimized training strategies improves the overall accuracy and reliability of the model in real-world PCB manufacturing conditions.
3.4. RCNN Resnet Model Training Architecture
Optimized Training Configuration for Efficient Model Convergence.
Key components are thoroughly configured to ensure efficient and stable training, including the following:
Device Allocation—With deep learning being a computationally intensive task, training is offered on a PyTorch, using an NVIDIA GPU (Tesla T4) for computational power. Optimizer Configuration—The Adam optimizer (lr = 0.0001, weight_decay = 0.0005) is used for adaptive parameter updates. The optimizer can converge efficiently by utilizing its adaptive learning rate mechanism [
43]. Learning Rate Scheduler—This defines the point at which the rate (step_size = 3, gamma = 0.1) of learning changes. Additionally, the rate of learning decreases as the training becomes more intensive. The learning rate scheduler helps to tune the model. This structured training procedure ensures that the model converges effectively, while maintaining robustness in defect detection. Through the incorporation of data augmentation into the model, this method is very helpful for avoiding overfitting [
44]. The forward pass in Faster R-CNN predicts bounding boxes and class labels for each proposed region, with the total loss being the sum of classification and regression losses. The model leverages a ResNet-50 backbone for PCB defect detection, extracting hierarchical feature maps that emphasize critical regions. ResNet-50’s 50-layer architecture captures multi-level features, from low-level edges and textures to high-level object shapes and patterns. As images pass through convolutional layers, these refined feature maps enhance defect localization and classification, ensuring precise and robust PCB defect detection. The lower layers capture basic features (edges, textures, simple shapes). Higher layers detected more complex patterns, helping to identify defect-related features such as cracks, missing components, or Short Circuits.
The formula for the output feature map is as follows:
The input PCB image I is processed through the ResNet-50 model, where θ represents the learned weights of the network. The convolutional operations, denoted by f, extract hierarchical features from the image, enabling robust representation of defect patterns. These features are then utilized by the Region Proposal Network (RPN) to identify regions of interest (ROIs) that are likely to contain defects, forming the foundation for accurate detection and classification. ROI Alignment and Classification—The ROIs are resized and classified into defect categories. Each ROI’s bounding box is further refined using bounding box regression.
Loss Calculation in Faster R-CNN Model:
Objectless Loss—This determines whether an ROI contains a defect (binary classification) [
5].
is the ground truth label (1 for foreground, 0 for background).
is the predicted probability of an object being a foreground object.
is the number of positive (foreground) samples.
Bounding Box Regression Loss—This measures the error between predicted and ground truth bounding box coordinates using Smooth L1 loss.
is the predicted bounding box coordinates,
is the ground truth bounding box coordinates, and Nreg is the number of bounding box regression samples [
45].
ROI Head Loss Components (Classification Loss)—Cross-entropy loss assigns defect classes to each ROI [
46].
The ROI Head Classification Loss quantifies what is effective for the model, and assigns defect classes to detect regions, resulting in the system’s reliability in PCB quality control. Its optimization is essential for the high performance achieved in our model results.
The total loss combines all components, according to Equation (19):
This unified loss enables training to be brought to an end, while handling class imbalance through normalization by positive sample counts (Nₚₒₛ). The implementation leverages PyTorch’s built-in loss functions with default reduction strategies, ensuring numerical stability during optimization [
15]. The Following
Table 2 summarizes the core computational stages of the Faster R-CNN pipeline adapted for PCB defect detection.
Validation and Evaluation: Validation measures the model’s performance on the test set, providing insight into how well the model will perform on new data. Model Evaluation: In validation mode, the model’s weights are not updated. Instead, it generates predictions based on the test images (20%) to see how well it has been learned to detect defects.
Intersection over Union (IoU) Calculation—IoU is a metric that measures the predicted bounding box that matches the true bounding box, as shown in
Figure 6. IoU indicates full overlaps between two boxes, while lower values indicate a better predicted box. The IoU is calculated per defect, and at the end, intermediate accuracy is presented as an averaged value over the whole test dataset [
47].
The performance of the model was checked on the validation set to find out how comprehensive the model would be. For the quantitative metric, IoU was used for the overlap of predicted and ground truth bounding boxes [
48]. IOU assesses the localization accuracy of PCB defect detection, with the Faster R-CNN model using it to classify region proposals [
49]. While effective, IoU has limitations in handling nested/partial boxes, which may explain some low scores in our validation [
50]. It is defined as follows:
From the validation phase of the Faster R-CNN model, IoU values are computed at each epoch. The mean IoU across all epochs is 0.722, which indicates that, on average, the predicted bounding boxes overlap 72.2% with the ground truth bounding boxes. A higher IoU value (closer to 1) suggests better localization accuracy, whereas lower values indicate discrepancies in bounding box predictions.
3.5. Inference and Deployment/Software Setup
For real-time or batch processing, the application needs to handle incoming PCB images consistently. Each image is loaded and prepared by applying the same preprocessing steps used during the model training phase, to ensure a consistent input format. Preprocessing includes the following steps. Resizing: Scaling images to the same dimensions used during training, so the model can accurately interpret them. Normalization: Adjusting pixel values to the same range as in the training data, to avoid inconsistencies that could lead to inaccurate predictions. Format Conversion: Ensuring images are in the appropriate colour format (e.g., RGB), so they are compatible with the model. The application, built in PyQt5 mode, includes an image and video upload feature that allows users to select or drag and drop an image into the interface. Once an image is loaded, it undergoes these preprocessing steps before being supplied to the model for prediction.
After preprocessing, the image is passed through the trained Faster R-CNN model, which identifies areas that are likely to contain defects. The following methods are the model outputs.
Bounding Boxes: These rectangles surround each detected defect, specifying its location within the image. Confidence Scores: Each bounding box is assigned a confidence score that represents the model’s confidence in the defect classification. Higher confidence scores indicate more reliable predictions. The model also provides a predicted class for each bounding box shown in
Figure 7, indicating the specific type of defect (e.g., Missing Hole, Mouse Bite). PyQt5’s integration with the model allows this entire prediction process to happen effortlessly in the background, with the output prepared for visualization in real time. (The implemented code has been uploaded to GitHub).
The visualization of defects in the image is a crucial part of the deployment process, as it allows users to interpret the model’s predictions directly. The bounding boxes are colour-coded based on defect type, with each defect assigned a unique colour to make it easy to distinguish between different issues. For example, Missing Holes might appear in red, while Open Circuits are highlighted in blue, and so forth. Using PyQt5, we have designed a canvas, shown in
Figure 8, which can display the original image with these overlays, providing clear and interactive feedback for the user. This approach involves the following processes.
Drawing Bounding Boxes: Bounding boxes are drawn over each detected defect. PyQt5’s QPainter class allows the user to render these boxes with different colours and thicknesses, ensuring that each defect type is easily identifiable. Displaying Confidence Scores: Text labels can be added alongside each bounding box to display the confidence score. This helps users to assess the reliability of each prediction. Interactive Controls: The PyQt5 interface includes controls to zoom in on specific defects, toggle bounding boxes on or off, and adjust visualization settings. These features allow users to explore the model’s predictions in greater detail and take screenshots for record-keeping or further analysis.
By leveraging Python 3.11 and PyQt5, we have built an efficient GUI application that provides a complete workflow for PCB defect detection. Here, we take a closer look at how PyQt5 supports each part of the application, as illustrated in
Figure 9.
File Upload and Image Loading—PyQt5’s Q File Dialog allows users to easily upload PCB images. When a user selects an image, the interface loads it and performs preprocessing, before passing it on to the model.
Table 3 provides a comprehensive breakdown of the PCB defect detection system’s technical components, detailing current implementations, key technical specifications, and planned developments.
Real-Time Inference and Display—Once the image is preprocessed, it is sent to the Faster R-CNN model, which performs inference and returns bounding box predictions, as shown in
Figure 10. The application’s main window displays the original image with colour-coded bounding boxes for easy interpretation. Zoom and Pan Controls—With PyQt5’s widgets, users can zoom in on specific parts of the image to closely examine detected defects. Exporting Results—PyQt5’s Q Image class can be used to save the image with bounding box overlays, providing users with a visual record of detected defects for quality control. Also, when this proposed model is deployed into a Raspberry Pi device, the model can be set up with an online server. The reason for this is that the processing capacity of the Raspberry Pi device makes it difficult to operate the model. After it is placed on the server, it manages to respond quickly. Starting with data collection and preparation, data transformations are applied, a deep learning model is initialized and customized, and then one can proceed through training, validation, and deployment. Each step contributes to building a robust model that can detect and classify PCB defects accurately. The final deployment phase allows the model to be used in a real-world environment, where it can provide visual insights and classifications of PCB defects to support quality control in manufacturing.