1. Introduction
Printed circuit boards (PCBs) form the backbone of electronic systems by mechanically supporting and electrically connecting components. Even minor PCB surface defects—such as scratches, open circuits, or solder shorts—can cause malfunctions and degrade overall system performance [
1]. Ensuring product quality and reliability therefore requires effective PCB defect detection. Traditionally, manufacturers have relied on manual visual inspection and automated optical inspection (AOI) for PCB quality control [
2]. Manual inspection is labor-intensive and prone to inconsistency, while conventional machine-vision methods struggle with the complexity of PCB patterns and the diverse appearances of defects [
2]. AOI techniques (e.g., template comparison or design-rule checking) can detect many defects but often require strict image alignment and controlled lighting; they also have difficulty generalizing to new defect types [
3]. In practice, as new defect patterns continually emerge with changes in manufacturing processes, rule-based AOI systems must be frequently recalibrated to handle unseen anomalies [
3]. These limitations, combined with the subjective and error-prone nature of human inspection, have driven a shift towards deep learning-based approaches for PCB defect detection [
4].
Deep learning, especially convolutional neural networks (CNNs), can automatically learn discriminative visual features and has achieved superior accuracy in general image recognition tasks—in some cases even approaching or exceeding human-level performance [
5]. By leveraging CNN models, PCB inspection systems become more adaptable to diverse or subtle defects without requiring explicit modeling of each defect type. In recent years, numerous studies have applied deep CNNs to PCB defect detection and reported significant improvements in detection accuracy over traditional methods [
4]. For example, a 2021 study by Kim et al. developed a skip-connected convolutional autoencoder to identify PCB defects and achieved a detection rate up to 98% with false alarm rate below 2% on a challenging dataset [
1]. This demonstrates the potential of deep learning to provide both high sensitivity and reliability in detecting tiny flaws on PCB surfaces, which is critical for preventing failures in downstream electronics.
Object detection models based on deep learning now dominate state-of-the-art PCB inspection research [
6]. In particular, one-stage detectors such as the You Only Look Once (YOLO) family have gained popularity for industrial defect detection due to their real-time speed and high accuracy [
7,
8]. Unlike two-stage detectors (e.g., Faster R-CNN) that first generate region proposals and then classify them, one-stage YOLO models directly predict bounding boxes and classes in a single forward pass—making them highly efficient [
6,
8]. Early works demonstrated the promise of YOLO for PCB defect detection. For example, Adibhatla et al. applied a 24-layer YOLO-based CNN to PCB images and achieved over 98% defect detection accuracy, outperforming earlier vision algorithms [
8]. Subsequent studies have confirmed YOLO’s advantages in this domain, showing that modern YOLO variants can even rival or surpass two-stage methods in both detection precision and speed [
6,
8]. The YOLO series has evolved rapidly—from v1 through v8 and, most recently, up to v11—with progressive architectural and training refinements (e.g., stronger backbones, decoupled/anchor-free heads, improved multi-scale fusion, advanced data augmentation, and Intersection-over-Union (IoU) aware losses) that collectively enhance accuracy–latency trade-offs across application domains [
9]. For instance, the latest YOLO models employ features like cross-stage partial networks, mosaic data augmentation, and CIoU/DIoU losses to better detect small objects and improve localization [
10,
11]. YOLOv5, in particular, has become a widely adopted baseline in PCB defect inspection, valued for its strong balance of accuracy and efficiency in finding tiny, low-contrast flaws in high-resolution PCB images [
12,
13]. Open-source implementations of YOLOv5 provide multiple model sizes (e.g., YOLOv5s, m, l, x) that can be chosen to trade off speed and accuracy, facilitating deployment in real-world production settings [
12]. However, standard YOLO models still encounter difficulties with certain PCB inspection challenges, such as extremely small defect targets, complex background noise, and limited training data. This has motivated researchers to embed additional modules into the YOLO framework and to explore semi-supervised training strategies tailored to PCB defect detection.
Beyond CNN-based detectors, Transformer-based architectures have recently emerged as another powerful paradigm for object detection [
14,
15]. Detection Transformer (DETR) and its successors formulate detection as a set prediction problem with a Transformer encoder–decoder, removing hand-crafted components such as anchors and non-maximum suppression while achieving competitive accuracy on Common Objects in Context(COCO) [
16]. Vision Transformers such as Swin Transformer have also been adopted as general-purpose backbones for detection and segmentation, providing strong multi-scale features via shifted window self-attention [
17]. Motivated by these advances, several works have begun to explore Transformer-based models for PCB defect inspection, including Transformer–YOLO hybrids [
18,
19] and real-time detection Transformers tailored to bare PCB inspection (e.g., Lite-DETR, Hierarchical Scale-Aware Real-Time Detection Transformer (HSA-RTDETR), and Multi-Residual Coupled Detection Transformer (MRC-DETR)) [
20,
21,
22]. These methods demonstrate that global self-attention and set-based decoding can further improve defect detection, but they typically rely on large-scale pre-training, longer training schedules, and heavier computation, which may complicate deployment in resource-constrained AOI systems [
18,
20].
One major challenge in PCB defect inspection is the very small size and subtle appearance of many defect types (e.g., pinhole voids, hairline copper breaks). These tiny defects may occupy only a few pixels and can be easily missed against intricate PCB background patterns [
4]. To address this, recent works have integrated attention mechanisms into YOLO detectors to help the network focus on important features. In particular, channel attention modules such as the Squeeze-and-Excitation (SE) and Convolutional Block Attention Module (CBAM) have been added to emphasize defect-relevant feature channels and suppress irrelevant background information [
23]. For example, Xu et al. reported that inserting a CBAM module into a YOLOv5-based model improved recognition of intricate, small PCB defects under complex backgrounds by enhancing the model’s attention to critical regions [
24]. A lightweight variant, Efficient Channel Attention (ECA), has proved effective in detection settings; by applying a short 1-D convolution to model local cross-channel dependencies—without the dimensionality reduction used in SE/CBAM—ECA enhances feature saliency with negligible computational overhead [
25]. Kim et al. demonstrated that adding an ECA module into a YOLOv5 backbone boosted the detection of small objects in aerial images, as the channel attention helped highlight faint targets against cluttered backgrounds [
25]. Similarly, an enhanced YOLOv5 model for surface inspection found that integrating ECA improved the identification of fine defects (especially tiny or low-contrast features) compared to using SE attention alone [
26]. These findings underscore that incorporating efficient attention mechanisms enables YOLO models to better capture subtle defect cues that might otherwise be overlooked. A streamlined ECA module is embedded in the YOLOv5 backbone to adaptively accentuate faint PCB defect patterns, enabling clearer separation of true defect signals from background circuitry.
Another limitation of vanilla YOLO detectors lies in the fixed sampling grid of standard convolutions, which restricts the receptive field from conforming to irregular defect geometries on PCB. Deformable Convolutional Networks (DCN) alleviate this constraint by learning location-dependent offsets so that kernels adaptively sample informative positions, effectively “bending” to follow fine discontinuities, burrs, and spurious copper patterns. By aligning the sampling lattice with true object boundaries, deformable convolutions help prevent the mixing of faint defect signals with background textures and thereby preserve small-object detail during feature extraction [
11]. Recent journal studies show that inserting a deformable layer into YOLO backbones or necks yields measurable gains on small-object benchmarks by retaining object cues and reducing background interference [
27,
28]. In the PCB context, improved YOLO variants that integrate DCN (or DCNv2) into high-resolution feature paths report enhanced localization of tiny, irregular defects and higher mean Average Precision, attributable to better spatial alignment around hairline breaks and micro-holes [
29]. Beyond PCB imagery, complementary evidence from aerial and industrial surface datasets confirms that lightweight DCN blocks can be deployed with modest computational overhead to sharpen feature selectivity on thin, elongated structures—an effect particularly valuable for defect edges and gaps [
28,
30]. Following these insights, a DCN-lite layer is placed in the YOLOv5 neck to introduce spatial flexibility where fine spatial detail is most critical, aiming to increase sensitivity to minute or oddly shaped PCB anomalies while preserving throughput [
29,
31].
Effective multi-scale feature fusion is essential in PCB inspection, where target sizes span from large solder bridges to sub-pixel pinholes. While the original YOLOv5 neck adopts a Path Aggregation Network (PANet), recent work shows that bi-directional pyramid designs with learnable fusion weights strengthen small-object representations and improve robustness to scale variation. In particular, Bi-Directional Feature Pyramid Networks (BiFPN) iteratively propagate information top-down and bottom-up, balancing low-level spatial detail with high-level semantics and yielding consistent gains over PANet-style necks in one-stage detectors [
32]. Journal studies report that replacing or augmenting PANet with BiFPN leads to higher precision and recall on small targets by avoiding attenuation of fine details during fusion [
33]. In PCB-focused research, lightweight YOLO variants that integrate BiFPN in the neck achieve superior accuracy on micro-defects, indicating that normalized, weighted cross-scale aggregation is particularly beneficial for tiny, low-contrast structures [
34]. Beyond PCB imagery, enhanced (augmented/weighted) BiFPN formulations further validate these trends in diverse vision tasks, demonstrating that learnable cross-scale weights can reduce information loss and emphasize discriminative cues at small scales [
35]. Guided by this evidence, the proposed model adopts a BiFPN neck to more effectively blend high-resolution detail and contextual semantics before detection, improving sensitivity to both macro-level faults and minute solder splashes [
36,
37].
In addition to stronger feature fusion, refining the upsampling operator in the neck materially benefits small-defect detection. Fixed schemes (e.g., nearest-neighbor) can blur fine edges and attenuate weak responses, causing misses on hairline cracks or pinholes [
38]. A learnable alternative is Content-Aware ReAssembly of Features(CARAFE), which predicts position-specific reassembly kernels from local content and reconstructs high-resolution features with a larger effective receptive field [
39]. Unlike fixed interpolation, CARAFE preserves boundary and texture cues during upscaling and has been shown to improve one-stage detectors on cluttered scenes with numerous tiny targets [
40]. Recent journal studies report that inserting CARAFE into YOLO-style necks yields higher precision/recall on small objects while maintaining real-time feasibility due to the module’s lightweight design [
41]. Further evidence from remote-sensing benchmarks indicates that CARAFE reduces information loss and better aligns multi-scale features compared with naïve interpolation, boosting mAP for dense small targets [
42]. Guided by these results, the present YOLOv5-based architecture replaces nearest-neighbor upsampling with CARAFE at top-down pathways to retain minute PCB defect details during feature magnification and to strengthen the downstream detector’s sensitivity to thin, low-contrast flaws [
43].
While architectural enhancements increase capacity, data scarcity and class imbalance remain practical bottlenecks in PCB defect inspection. In early production or when new defect modes emerge, only a handful of labeled samples may exist, making fully supervised training prone to overfitting and poor generalization. Semi-supervised object detection (SSOD) addresses this by exploiting large pools of unlabeled imagery together with few labels, commonly through pseudo-labeling and consistency regularization in teacher–student schemes [
44,
45]. This setting aligns well with PCB lines, where acquiring images at scale is easy but fine-grained annotation is costly; leveraging unlabeled frames expands the distribution of backgrounds, lighting, and rare defects seen during training [
44,
46]. Recent journal studies demonstrate that filtering uncurated unlabeled sets and enforcing consistency across augmentations markedly improves pseudo-label quality and downstream detection, boosting mAP in low-label regimes [
44]. Practical SSOD variants also integrate adaptive thresholds or active selection to suppress noisy pseudo-boxes while retaining diverse positives, further stabilizing one-stage detectors [
47,
48]. Guided by these findings, a single-cycle self-training pipeline is adopted: a detector trained on labeled PCB images generates high-confidence pseudo-labels on unlabeled data; the labeled and pseudo-labeled samples are then mixed without ratio heuristics for retraining, improving recall of subtle anomalies while keeping computational overhead modest [
45,
49]. In effect, training on both labeled and pseudo-labeled data broadens coverage of rare, small, and low-contrast defects, reducing false negatives and improving robustness in deployment [
45,
49].
However, existing PCB-oriented defect detectors still leave several practical gaps. Many YOLO-based variants assume fully annotated training sets and do not exploit the abundant unlabeled PCB images available on production lines. Other works focus solely on architectural modifications but do not systematically address the simultaneous requirements of (i) high sensitivity to tiny, low-contrast defects, (ii) operation under label-scarce regimes, and (iii) constrained computational budgets for deployment. As a result, there is still a lack of a deployment-ready, label-efficient one-stage PCB detector that jointly enhances small-defect representation and leverages unlabeled data.
Therefore, the objective of this study is to develop and evaluate a task-aligned, label-efficient PCB defect detector. We augment YOLOv5 with ECA, DCN-lite, BiFPN and CARAFE to strengthen multi-scale feature representation at modest computational cost. In the remainder of this paper, we refer to this architecture as the ECA–DCN-lite–BiFPN–CARAFE-enhanced YOLOv5 (the proposed model). In addition, we design a simple single-cycle semi-supervised training scheme that uses confidence-thresholded pseudo-labels on unlabeled PCB images to expand the effective training set. The effectiveness of the proposed model detector is then validated on the PKU-PCB dataset under different label-scarce regimes, with ablation studies and comparisons to recent YOLO variants. Rather than introducing entirely new backbone blocks, this work focuses on a task-aligned integration of existing attention, deformable, and feature-fusion modules and on a simple yet effective semi-supervised training scheme, with an emphasis on label efficiency and deployability for PCB AOI. Relevance to the Sustainable Development Goals. In the context of smart manufacturing, accurate and timely inspection is a cornerstone of resilient industrial infrastructure (SDG 9). By improving defect detection under label-scarce regimes, the proposed approach lowers the dependence on extensive manual annotation and supports scalable deployment of AI-enabled automated optical inspection.
2. Methods
Our approach comprises two main components: (1) a modified YOLOv5-based architecture with an enhanced backbone and neck (incorporating ECA, DCN-lite, BiFPN, and CARAFE modules), and (2) a one-stage semi-supervised training pipeline that leverages unlabeled data via pseudo-labeling. Readers who are mainly interested in the overall idea and results can focus on
Figure 1 and
Figure 6 together with the short summary in
Section 2.3, and then proceed directly to
Section 3. The following subsections provide more detailed descriptions of each module and the training procedure for readers who wish to reproduce or extend the method. Each component is detailed below.
2.1. Network Architecture: ECA–DCN-Lite–BiFPN–CARAFE-Enhanced YOLOv5
An overview of the modified YOLOv5 architecture is shown in
Figure 1 above. The network is built on a YOLOv5 backbone with four key module enhancements aimed at improving feature extraction and detection of small defects:
ECA: Inserted by replacing the standard C3 blocks with C3ECA at the P2, P3 and P4 stages (strides 4, 8 and 16) of the YOLOv5 backbone to adaptively re-weight feature channels.
DCN-lite: A single lightweight deformable block (C3_DCNLite) is placed on the high-resolution P3 branch (stride 8) after the top-down BiFPN fusion at P3 and before the stride-8 detection head, so that deformable sampling focuses on the smallest defects.
BiFPN with WA_SC: The original PANet neck is replaced by a bidirectional BiFPN operating on P5, P4 and P3. Before fusion, P5 and P4 features are projected to 640 channels and P3 to 320 channels by 1 × 1 lateral convolutions, and each fusion node applies a WA_SC block (WeightedAdd + Separable Convolution).
CARAFE upsampling: In the top-down path (P5→P4 and P4→P3), CARAFE is used as the upsampling operator with factor 2, replacing fixed interpolation and preserving fine defect structures.
2.1.1. Notation and Topology
Let the input image be
(default S = 640). The backbone outputs feature maps
at strides (8, 16, 32). Lateral
convolutions align channels before fusion: P5 and P4 are projected to 640 channels and P3 to 320 channels by 1 × 1 convolutions, as reflected in the final YOLOv5 configuration file (
Appendix A). This ensures that all inputs to a given fusion node have the same channel dimension before applying WA_SC. The neck performs a top-down pass (P5→P4→P3) using CARAFE upsampling and a bottom-up pass (P3→P4→P5) using strided depthwise separable convolutions. Each fusion node applies a WeightedAdd operation to its inputs, followed by a depthwise separable refinement. The detector head predicts at strides 8, 16 and 32.
2.1.2. ECA Inside C3
Intuitively, the ECA module lets each channel look at a small neighborhood of channels and decide how important they are, using a tiny 1D convolution. Channels that are more informative for PCB defects receive higher weights, while less useful channels are suppressed, and this is done without adding a large number of extra parameters. As illustrated in
Figure 2, the ECA block first applies global average pooling (GAP) to obtain a per-channel descriptor, then passes it through a small 1D convolution and a sigmoid to produce channel-wise weights, which are finally multiplied back to the original feature map. For a feature tensor
, global average pooling produces a per-channel descriptor:
A small 1D convolution and a sigmoid activation then generate channel-wise weights:
and the output feature map is
The kernel size k of the 1D convolution is a small odd number determined by the channel count C, as in the original ECA design. ECA thus re-weights channels without dimensionality reduction, preserving efficiency. The internal computation of the ECA block inserted into C3 is illustrated in
Figure 2.
2.1.3. CARAFE Content-Aware Upsampling
Following the CARAFE design, shown schematically in
Figure 3, the module consists of a content encoder, a kernel prediction module and a reassembly operator. The encoder aggregates local content into a lower-resolution feature, the kernel predictor generates position-specific upsampling kernels, and the reassembly operator uses these kernels to reconstruct a higher-resolution feature map. In the neck, CARAFE replaces fixed interpolation with a content-aware upsampling operator. Instead of using the same weights everywhere, CARAFE looks at the local neighborhood around each position, predicts a small reassembly kernel conditioned on the local content, and then uses this kernel to reconstruct the upsampled feature map. This helps preserve fine edges and tiny defect patterns when going from low-resolution to high-resolution features. Formally, given a low-resolution feature map
and upsampling factor
, CARAFE first encodes local content:
and predicts spatially varying reassembly kernels by
where
stores a reassembly kernel for each spatial location. Let unfold(X) denote the unfolded local patches of X. For each low-resolution location p and subpixel offset
(corresponding to a position in the
upsampled neighborhood), the high-resolution output Y is reconstructed as:
where
is the local neighborhood around
. In the top-down path (P5→P4, P4→P3), CARAFE replaces fixed interpolation, preserving fine edges and small patterns with minimal overhead.
In the top-down path (P5→P4, P4→P3), CARAFE replaces fixed interpolations, preserving fine edges and small patterns with minimal overhead. As shown in
Figure 3, CARAFE upsamples a feature map by first encoding local content and then predicting position-adaptive reassembly kernels.
2.1.4. BiFPN-Style Fusion with WA_SC (WeightedAdd + Separable Convolution)
Each BiFPN fusion node, depicted in
Figure 4, can be described by three steps: (i) assign a non-negative learnable weight to each input feature; (ii) normalize these weights so that they sum to 1; and (iii) form a weighted sum of the inputs followed by a depthwise separable convolution. Let
be the inputs and
the raw fusion weights. For inputs
entering a fusion node at a fixed scale, we learn a scalar weight
for each input and normalize them as:
where
is a small constant for numerical stability. The fused tensor is then computed as:
and refined by a depthwise-separable convolution block consisting of a depthwise 3 × 3 convolution (stride 1, padding 1, groups = C) followed by a pointwise 1 × 1 convolution (stride 1), both with Batch Normalization and a SiLU activation.
We refer to this pair—the normalized WeightedAdd followed by the depthwise-separable refinement—as the WA_SC block. An identical WA_SC block is used at every BiFPN fusion node in both the top-down (P5→P4→P3) and bottom-up (P3→P4→P5) paths. Before fusion, P5 and P4 features are projected to 640 channels and P3 to 320 channels by 1 × 1 lateral convolutions so that all inputs to a WA_SC node have matching channel dimensions (see
Appendix A for the exact configuration).
The detailed internal structure of the BiFPN neck and its WA_SC fusion nodes is illustrated in
Figure 4, where each fusion node explicitly shows the successive WeightedAdd and depthwise-separable convolution operations.
Figure 4 gives a schematic view of the bidirectional BiFPN neck and its WA_SC fusion nodes.
2.1.5. DCN-Lite on the High-Resolution Path (P3)
The DCN-lite block on P3, illustrated in
Figure 5, predicts a small offset for each position of a 3 × 3 sampling grid and then uses the shifted grid to perform convolution. This gives the highest-resolution feature map mild geometric flexibility to better fit irregular defect shapes. Formally, a single DCN-lite is applied to the P3 branch (stride 8) to introduce localized geometric flexibility with minimal latency overhead. For an input feature map X and learned offsets
on the sampling grid
(e.g., a 3 × 3 grid), the deformable convolution output at location
is
where
are the convolution weights and bilinear interpolation is used for non-integer sampling coordinates. Concentrating the deformable operation at the finest resolution (P3) enhances sensitivity to small and irregular defects while keeping the model lightweight. The placement and structure of the DCN-lite module on the P3 feature map are depicted in
Figure 5.
2.2. One-Stage Semi-Supervised Training
The training scheme exploits unlabeled images via a single-cycle pseudo-label self-training pipeline with three stages. We first train the detector on the small labeled set, then run it on the unlabeled images to obtain high-confidence predictions as pseudo-labels, and finally retrain the model on the union of labeled and pseudo-labeled data. This design intentionally keeps the semi-supervised part simple and implementation-friendly. No EMA teacher, test-time augmentation (TTA), or labeled/unlabeled sampling-ratio heuristics are used. Compared with more advanced teacher–student SSOD frameworks such as Unbiased Teacher, Soft Teacher, or DenseTeacher, which rely on EMA teachers, strong/weak augmentation pairs, and extra consistency or dense pseudo-labeling losses, our design is intentionally minimalist and aims to provide a lightweight, implementation-friendly baseline that can be readily plugged into existing YOLOv5-based AOI pipelines on PCB images.
2.2.1. Notation
Let the labeled and unlabeled sets be
For an image
, the detector outputs a set of predicted bounding boxes
, class posteriors
, and objectness scores
, The detection confidence for prediction
is defined as
2.2.2. Stage A—Supervised Pre-Training
Stage A is standard supervised training on the labeled set
. We use the Full model as the detector and optimize the usual YOLOv5 detection loss, which combines bounding-box regression, objectness and classification terms over the three detection layers. The loss for a mini-batch
can be written as:
where θ denotes the model parameters. We use the YOLOv5 default gains
.
is the bounding-box regression loss (e.g., CIOU) computed on positive anchors,
is the BCE objectness loss with IoU-based soft targets for positives and 0 for negatives, and
is the BCE classification loss with optional label smoothing. Layer-balance coefficients for the three detection layers also follow the YOLOv5 defaults.
2.2.3. Stage B—Pseudo-Label Generation
In Stage B, we run the Stage-A model on every unlabeled image and convert confident detections into pseudo-labels. Using the trained Full model, inference is run over all
. After non-maximum suppression, only predictions whose confidence exceeds a fixed threshold
are kept as pseudo-annotations:
The threshold is fixed throughout all experiments.
2.2.4. Stage C—Retraining on the Mixed Pool (Uniform Sampling)
The mixed training set is
In Stage C, mini-batches are sampled uniformly at random from
without any labeled/unlabeled ratio control. The objective is to minimize the same detection loss over
, treating pseudo-labels and ground-truth labels identically:
Here (x, y) may correspond to either a labeled example from or a pseudo-labeled sample from .
2.2.5. Flowchart and Implementation Remarks
Figure 6 summarizes the single-cycle semi-supervised training pipeline described in
Section 2.2.1,
Section 2.2.2,
Section 2.2.3 and
Section 2.2.4 Stage A trains the Full model on the labeled set to obtain an initial detector; Stage B runs inference on the unlabeled pool and retains high-confidence predictions as pseudo-labels; Stage C retrains the detector on the mixed labeled and pseudo-labeled pool using the same detection loss.
Figure 6.
Flowchart of the single-cycle semi-supervised training pipeline.
Figure 6.
Flowchart of the single-cycle semi-supervised training pipeline.
Figure 6 visualizes the single-cycle procedure. Stage A trains on labeled data to obtain the initial detector; Stage B generates pseudo-labels on
by confidence thresholding; Stage C retrains on
with uniform sampling and the same detection loss for both label sources.
Implementation details.
Sampling. Mini-batches are drawn uniformly at random from the mixed set ; no labeled/unlabeled ratio is enforced.
Confidence thresholding & NMS. A single, class-agnostic confidence threshold τ is used for pseudo-label generation (Stage B). The NMS configuration (e.g., IoU threshold) matches that which is used in validation for consistency.
Loss & assignment. The detection loss and anchor/assignment strategy follow the standard YOLOv5 implementation. Accepted pseudo-labels are treated as hard targets in Stage C, identical to ground-truth annotations.
2.3. Summary of Design Choices
Rather than proposing entirely new backbone or neck primitives, our design adopts a minimalist strategy: it selects a small set of proven modules and places them where they are most beneficial for tiny PCB defects and label-scarce training. The key design choices are:
Channel re-weighting: ECA inside C3 improves feature selectivity with negligible overhead (see
Section 2.1.2).
Content-aware upsampling: CARAFE preserves fine structures while injecting context into high-res maps (
Section 2.1.3).
Cross-scale fusion: BiFPN-style WeightedAdd provides normalized, non-negative fusion that balances semantics and detail (
Section 2.1.4).
Geometric flexibility: A single P3 DCN-lite targets the finest scale, improving small-defect recall at low latency (
Section 2.1.5).
Semi-supervision: A one-cycle pseudo-labeling scheme with uniform sampling serves as a lightweight SSOD baseline that avoids EMA teachers, strong/weak augmentation pairs, or labeled/unlabeled ratio heuristics, yet still yields measurable gains under label-scarce PCB conditions (
Section 2.2).
For exact reproducibility, the full YOLOv5-ECA-DCN-BiFPN-CARAFE configuration file used in all experiments is provided as
Appendix A in the Appendix, specifying every backbone and neck layer, including kernel sizes, strides and channel dimensions.
2.4. Data and Training Setup
Experiments are conducted on the PKU-PCB [
50] dataset comprising six defect categories (open circuit, short circuit, mouse bite, spur, spurious copper, and pin-hole). To reflect realistic production constraints—few labeled images and many unlabeled images—the training data are split into a small labeled subset and a larger unlabeled pool, with independent validation and test sets held out.
The default semi-supervised configuration, which matches the dataset-split schematic, is summarized in
Table 1.
All subsets are disjointed (no image overlap). Unlabeled images contribute to training only via pseudo-labels; the test set remains completely unseen until the final evaluation. The validation (600) and test (2134) splits remain fixed and non-overlapping across all experiments. This design enables controlled comparisons across label-scarce regimes while preserving a consistent, independent benchmark for model selection and final reporting. In the 100 labeled training images used for supervised and semi-supervised learning, there are 215 annotated defects in total: 36 open, 36 short, 34 mouse bite, 38 spur, 40 spurious copper, and 31 pin-hole instances. The held-out test set of 2134 images contains 4349 annotated defects, with 660, 706, 803, 765, 676 and 739 instances for the six classes, respectively. Thus, each class accounts for roughly 14–19% of all annotations. The dataset is therefore not extremely long-tailed, but the labeled subset still provides somewhat fewer examples for certain categories (e.g., mouse bite and pin-hole) than for others, which can make these defects harder to learn reliably from labeled data alone.
In this work, all experiments are conducted on the PKU-PCB dataset, which we adopt as a representative public benchmark for multi-class PCB defect detection. We acknowledge that restricting evaluation to a single dataset limits the assessment of cross-dataset generalizability, and we therefore interpret our findings as evidence of effectiveness on PKU-PCB rather than as universal conclusions for all PCB settings.
Training Details. All experiments were conducted on a workstation equipped with a NVIDIA RTX 4090 (24 GB) GPU, an Intel® Xeon® Gold 6258R CPU, and 128 GB RAM. The software environment comprised Python 3.10 and PyTorch 2.1, running the official YOLOv5 codebase. Models were trained with a batch size of 16 using SGD (momentum 0.937, weight decay 5 × 10−4). The initial learning rate was 0.01 and followed a cosine decay schedule over the course of training. Each phase was run for up to 200 epochs, with early stopping triggered by a plateau in validation mAP to mitigate overfitting. Standard YOLOv5 data augmentations were enabled—random image scaling, horizontal flipping, color jitter, and Mosaic composition—to increase appearance diversity and improve generalization under limited labels. During inference, the confidence threshold was 0.25 (YOLOv5 default). For pseudo-label generation in the semi-supervised stage, a stricter threshold τ = 0.60 was applied to retain only high-confidence detections for retraining.
4. Conclusions
This paper presented a label-efficient PCB defect detector that augments YOLOv5 with ECA, DCN-lite, a BiFPN neck with WeightedAdd and separable convolutions, and CARAFE upsampling, together with a single-cycle semi-supervised training scheme. On the PKU-PCB benchmark, the proposed ECA–DCN-lite–BiFPN–CARAFE-enhanced YOLOv5 improves supervised mAP@0.5 from 0.870 (baseline YOLOv5) to 0.910 while reducing parameters and GFLOPs. When combined with semi-supervised training on 100 labeled and 1000 unlabeled images, the Full + SSL model further reaches 0.943 mAP@0.5 with 94.4% precision and 91.2% recall, outperforming several recent YOLO-based detectors and delivering consistent gains across all defect classes.
The main advantages of the proposed design are: (i) higher accuracy for tiny, low-contrast PCB defects due to targeted architectural modules; (ii) strong label-efficiency enabled by the one-stage pseudo-labeling pipeline; and (iii) moderate model size and computation, which facilitate deployment in AOI systems. These strengths make the approach attractive for practical PCB inspection scenarios where annotation budgets are limited. A small variance analysis over multiple random seeds further confirms that the observed mAP gains are stable and not due to a single favorable run.
Nonetheless, this work has several limitations. First, all experiments are conducted on a single public PCB dataset (PKU-PCB), so the generalizability of the gains to other PCB designs, imaging setups, and industrial lines remains to be verified. Second, the study focuses on a single detector family (YOLO-based one-stage detectors, instantiated here as YOLOv5) and explores only a basic one-cycle pseudo-labeling scheme, without numerical comparison to more advanced teacher–student SSOD frameworks such as Soft Teacher or Unbiased Teacher, or to Transformer-based detectors such as DETR or RT-DETR variants. Although the proposed detector substantially improves detection performance for all six defect types under a moderately imbalanced label distribution, residual gaps between classes remain, and future work will explore explicit imbalance-aware strategies such as class rebalancing or cost-sensitive losses to further enhance recognition of relatively less frequent or safety-critical defects.
Future research will extend the evaluation to additional PCB and industrial defect datasets, explore other backbone/neck variants and more advanced semi-supervised strategies, and assess the method in end-to-end AOI systems on real production lines. It will also benchmark the proposed architecture against representative Transformer-based models under the same label-efficient AOI constraints and investigate combining Transformer-style global context with the proposed YOLOv5 enhancements.
Overall, the contribution of this work is primarily system- and application-oriented: it shows that a carefully designed combination of existing attention, deformable and feature-fusion modules, together with a simple single-cycle semi-supervised scheme, can deliver substantial gains for tiny, low-contrast PCB defects under realistic label constraints. Future research will explore extending these design principles to other detector families and more advanced semi-supervised strategies.
By enabling accurate, efficient, and scalable AI-driven PCB inspection for modern production lines, this work directly advances SDG 9: Industry, Innovation and Infrastructure, supporting smart manufacturing and resilient industrial automation.