SM-FSOD: A Second-Order Meta-Learning Algorithm for Few-Shot PCB Defect Object Detection

Xinnan Shao; Zhoufeng Liu; Qihang He; Miao Yu; Chunlei Li

doi:10.3390/electronics14193863

,

and

Department of Information and Communication Engineering, Zhongyuan University of Technology, Zhengzhou 450007, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics2025, 14(19), 3863;https://doi.org/10.3390/electronics14193863

This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications, 2nd Edition

Version Notes

Order Reprints

Abstract

Few-shot object detection methods perform well in natural scenes, where meta-learners can effectively extract target features from limited support data. However, PCB defect detection faces unique challenges: scarce defect samples, low-resolution targets, and severe overfitting risks. Additionally, PCBs’ dense circuitry creates low-contrast defects that blend into background noise, while traditional meta-learning struggles to generate realistic synthetic defects under actual manufacturing constraints. To overcome these limitations, we propose SM-FSOD, a second-order meta-learning model featuring a defect-aware prototype network. Unlike conventional approaches, it dynamically emphasizes critical defect features when constructing class prototypes from few-shot samples. Our extensive few-shot experiments on the DeepPCB and DsPCBSD+ datasets demonstrate the performance of the SM-FSOD model. Comparative tests on the DeepPCB dataset show that, compared with the strong Meta-RCNN baseline, our model achieves a maximum performance improvement of 14.0% under the challenging five-shot setting, while still attaining a 7.6% accuracy gain in the more relaxed 50-shot setting. Similarly, evaluation on the DsPCBSD+ dataset reveals that our proposed method maintains an average accuracy improvement of 2.3% to 9.6% compared to the competitive DeFRCN model in complex scenarios, indicating the strong adaptability of SM-FSOD across various application environments. Ablation studies further demonstrate that incorporating the improved MOMP and DGPT modules individually yields average accuracy gains of 3.6% and 4.5%, respectively, under the five-shot setting compared to the baseline, confirming that these enhancements can orthogonally improve the detection precision in few-shot PCB scenarios.

Keywords:

few-shot object detection; PCB defect detection; meta-learning; prototype fusion

1. Introduction

In recent years, few-shot object detection (FSOD) has emerged as a research hotspot in the field of computer vision, aiming to address object detection challenges in scenarios with scarce annotated data. Traditional object detection algorithms primarily rely on large amounts of annotated data for supervised learning [1]. However, in specialized domains, acquiring extensive labeled data is often costly and impractical. To tackle this issue, researchers have proposed various few-shot object detection algorithms based on transfer learning [2] or meta-learning [3,4] strategies. In natural scene object detection tasks, few-shot object detection technology has achieved remarkable progress. Through meta-learning-based or metric learning-based approaches, effective knowledge transfer from base classes to novel classes can be achieved, enabling few-shot object detection models to recognize new category targets with only minimal annotations. These few-shot object detection methods typically depend on rich feature diversity and uniformly distributed samples to facilitate knowledge transfer. As a result, they are particularly well suited for natural image scenes where objects exhibit significant variations yet maintain structural consistency. These approaches have demonstrated strong generalization capabilities on natural scene datasets such as COCO [5] and Pascal-VOC [6].

However, in the context of practical industrial applications, a significant research gap persists between the current few-shot object detection techniques and PCB defect detection requirements. Conventional few-shot methods prove inadequate in addressing the characteristic challenges of PCB defects—including low contrast, microscopic scales, and complex background noise—while traditional PCB surface inspection methodologies remain heavily dependent on large annotated datasets, fundamentally failing to overcome the scarcity of defect samples in real-world production environments. Existing research has yet to effectively resolve critical challenges such as domain adaptation needs, real-time processing constraints, and sensitivity to microscopic defects in manufacturing settings. Consequently, our research aims to bridge this gap by advancing innovative applications of few-shot learning theory in industrial vision systems, thereby enhancing quality control standards for electronic products.

However, the current few-shot object detection methods face two critical challenges when applied to PCB surface defect detection tasks. The first issue is the insufficient diversity of PCB defect features. Different types of defects on PCB surfaces may exhibit only subtle morphological and textural differences, while defects of the same category demonstrate high morphological consistency across different locations or scales [7]. This results in weak inter-class variations and limited intra-class diversity. In few-shot scenarios, the limited annotated samples cannot adequately cover these fine-grained variations, making it difficult for models to learn discriminative feature representations, which significantly reduces the defect classification performance. The second challenge arises from the high-density circuit patterns and repetitive metal textures on PCB surfaces, which introduce substantial background noise [8]. This noise not only shares confusing similarities with genuine defects in local features but also interferes with the region proposal network’s (RPN) candidate box generation, leading to numerous false positives or missed detections and reducing the model’s generalization capabilities. Under few-shot conditions, models struggle to effectively learn background noise suppression from limited samples, further exacerbating false detection rates and causing the detection accuracy to fail to meet the stringent requirements of industrial quality inspection. It is evident that existing general-purpose few-shot object detection algorithms exhibit significant limitations when applied to PCB defect detection. The primary manifestation is that natural image-oriented models lack prior knowledge modeling of PCB boards’ regular texture patterns, resulting in the substantially reduced discriminative power of few-shot features under complex background interference.

To address the aforementioned two limitations, researchers have proposed several relevant approaches. Hsiao et al. [9] proposed a ResNet-SE-CBAM Siamese network that integrates residual modules, squeeze-and-excitation modules, and convolutional block attention modules. This architecture enhances feature extraction capabilities through attention mechanisms and employs metric learning to enable category expansion without retraining. Combined with SSIM-based sample selection and high-defect-rate training strategies, their method focuses on optimizing feature representation through attention mechanisms, thereby complementing the multi-scale defect-aware prototype network proposed in the following section. We propose a second-order meta-learning-based few-shot object detection algorithm for PCB surface defect inspection, which achieves query–support feature fusion and cross-domain knowledge transfer through a dual-path Transformer architecture. Specifically, the current meta-learning approaches predominantly adopt the Faster R-CNN framework, whose detection performance heavily relies on the accuracy of region proposal generation. However, region proposals tend to overemphasize local texture features, making it difficult for the model to learn comprehensive foreground representations of PCB surface defects when novel classes are few and intra-class variations are limited. Therefore, this paper introduces a deformable Transformer structure as the fundamental framework for object detection, thereby mitigating the issues of excessive local feature dependency and error propagation in target localization that are inherent to region proposal-based approaches. Extending this approach, to address the challenge of insufficient intra-class sample diversity for novel categories in few-shot PCB defect detection tasks, we propose a multi-order meta-enhanced prototype (MOMP) Network. This framework employs meta-learning strategies for shallow feature enhancement, dynamically optimizing support features using globally filtered query features. The method effectively resolves the limitation of conventional single-feature-constructed support prototypes in comprehensively capturing target diversity and complexity, which often restricts their guidance capabilities for query information. Furthermore, to tackle the few-shot defect foreground generalization problem under high-density PCB background noise interference, we present the dual-prototype guided Transformer (DPGT) method. By designing a parallel architecture within the Transformer encoder that generates noise-invariant features through support set prototypes, our approach achieves robust cross-sample knowledge transfer from support to query information while resisting interference. Unlike existing multi-level methods that rely on the simple concatenation of hierarchical features, the second-order meta-learning proposed in this paper achieves deep semantic alignment between support and query sets through gradient-level optimization, overcoming the limitations of traditional methods in representing fine-grained defects in few-shot scenarios. Compared to attention-based approaches that focus solely on feature re-weighting in the spatial or channel dimensions, our method explicitly models cross-sample semantic relationships through task-level meta-optimization strategies.

To validate the effectiveness of the proposed algorithm, we employed two PCB surface defect datasets—DeepPCB [10] and DsPCBSD+ [11]—as experimental datasets for training and accuracy evaluation. During the experimental setup phase, we performed few-shot grouping on these two PCB defect datasets. For the DeepPCB dataset, we selected four categories as base classes and another two categories as novel few-shot classes. For the DsPCBSD+ dataset, we chose five categories as base classes and four additional categories as novel few-shot classes. After completing the few-shot dataset partitioning, we conducted model training and evaluation based on this few-shot dataset, comparing the proposed algorithm with meta-learning-based algorithms such as Meta-FRCNN. The results show that our SM-FSOD algorithm achieved a 9.8–14.0% improvement in the AP50 metric on the DeepPCB dataset. Similarly, on the DsPCBSD+ dataset, the AP50 benchmark of SM-FSOD still surpassed that of Meta-FRCNN by 1.7–5.1%. Experimental results demonstrate that, compared with the transfer learning-based TFA [12] and meta-learning-based algorithms like Meta-FRCNN [13] and FSOR-SR [14], SM-FSOD exhibits superior detection accuracy for few-shot categories. Moreover, even when the number of few-shot categories increases, SM-FSOD maintains stable accuracy performance.

Based on the aforementioned research background, the structure of this paper is organized as follows. Section 2 summarizes prevailing general frameworks for PCB surface defect detection and systematically analyzes the limitations of two mainstream few-shot object detection methods in PCB defect identification tasks. Section 3 innovatively proposes a few-shot object detection algorithm based on a second-order meta-learning strategy. Section 4 validates the superior performance of the proposed method through systematic comparative experiments and ablation studies on multiple domain-specific PCB surface defect datasets. Finally, the full-text research achievements are summarized and future research directions are outlined. The core innovations of this research are primarily manifested in the following aspects:

(1): We conducted few-shot data partitioning on two PCB defect datasets and developed a second-order meta-learning framework specifically tailored for the characteristic features of PCB surface defects.
(2): We proposed the MOMP network to enhance shallow support features via meta-learning strategy, significantly improving foreground diversity for novel defect categories.
(3): We developed the DPGT network utilizing a parallel Transformer to generate noise-resistant features, effectively suppressing background interference and enabling stable few-shot PCB surface defect detection.

2. Related Works

In this section, we will review recent research on object detection algorithms and few-shot object detection (FSOD) algorithms, analyzing their advantages and limitations in object detection tasks. By comparing these methods with our proposed approach, we highlight the significance of our research.

2.1. Object Detection Algorithm on PCB Defects

As a core component of modern electronic products, printed circuit boards (PCBs) play a vital role in interconnecting and integrating various electronic components. With the trend toward miniaturization and high integration in electronic devices, the design density and structural complexity of PCBs have grown exponentially, while manufacturing process requirements have become increasingly stringent. During industrial production, PCB surfaces may develop typical process defects such as open circuits, short circuits, and missing holes. If these microscopic defects are not detected and addressed promptly, they can not only disrupt the normal operation of subsequent modules but may also cause failures in entire digital systems [15]. For this reason, PCB surface defect detection algorithms have become an important research direction in the field of computer vision, as their detection accuracy and efficiency directly determine the performance stability and production quality of electronic products.

Du et al. [16] proposed YOLO-MBBi, a YOLOv5-based PCB defect detection model featuring an MBConv backbone and CBAM-enhanced BiFPN for improved multi-scale feature fusion, effectively addressing small defect detection and background interference challenges. Xu et al. [17] proposed DSASPP-YOLOv5 for PCB defect detection, featuring a novel DSASPP module that combines multi-dilation atrous convolution with global pooling to enhance feature extraction. The method employs depthwise separable convolution and K-means++ anchor optimization to improve small defect detection in high-density PCBs while maintaining real-time performance. Xie et al. [18] proposed a multi-frequency aggregation–diffusion feature flow composite paradigm (MFAD-RTDETR) based on an improved RT-DETR. By integrating the MFAD feature composite paradigm composed of ADF and RFC modules, their approach enhances fine-grained feature representation for defect targets while preserving global attention, significantly improving the detection accuracy for tiny and irregular PCB defects. Yan et al. [19] developed a defect detection algorithm based on the YOLOv8 framework. By refining the CGFPN network structure, they optimized multi-scale feature fusion, notably improving the small target detection performance. Additionally, they introduced the WaveletUnPool module, leveraging a wavelet transform to enhance the upsampling process and accurately restore defect detail features. Ai et al. [20] designed a dual-layer routing attention mechanism based on the YOLOv8 framework. Through spatial–channel dual-path feature interaction, their method fuses shallow texture features with deep semantic features. They further optimized the loss function by proposing Shape-IoU, which adapts to the geometric characteristics of PCB defects, significantly enhancing the detection performance for weak and small defect targets. The Attention-GhostUNet++ model proposed by Hayat et al. [21] significantly enhances the medical image segmentation accuracy through the integration of channel, spatial, and depth attention mechanisms. Its core ghost module utilizes lightweight convolution operations to generate ghost feature maps, effectively reducing computational redundancy while maintaining performance. This attention optimization approach demonstrates strong alignment with the design concept of “lightweight attention for tiny, low-contrast defects”, as both employ dynamic feature weighting to achieve the precise capture of critical regions. The architecture’s achievement of a 0.96+ Dice coefficient in CT image visceral adipose tissue segmentation validates the effectiveness of lightweight attention in weak signal detection tasks, thereby providing a transferable technical framework for PCB micro-defect detection.

2.2. Few-Shot Object Detection (FSOD) Algorithm

The goal of few-shot object detection is to train a model using a small number of annotated samples, enabling it to effectively classify and localize the target objects corresponding to these limited examples. To understand the few-shot object detection problem, it is necessary to formally define it [12]. As shown in Figure 1, given a dataset D composed of D_Base and D_Novel, where D_Base denotes the base class dataset containing abundant labeled samples for each category. In contrast, D_Novel represents the novel class dataset, where each novel class contains only a few samples. Additionally, there is no sample overlap between the base class and novel class datasets. Given a test image, the model is required to localize and classify foreground objects of target categories. Consequently, the few-shot object detector must extract transferable prior knowledge from data-rich base classes to facilitate novel-class generalization under extreme data scarcity.

Figure 1. Two few-shot detection methods: (A) meta-learning-based few-shot object detection algorithm and (B) Transfer learning-based few-shot object detection algorithm.

The transfer learning paradigm [22,23,24,25] represents a relatively simple few-shot learning strategy. Its core principle involves pre-training models with large-scale datasets, followed by fine-tuning partial model parameters using limited few-shot samples, as illustrated in Figure 1B. Wang et al. [22] proposed an orthogonal progressive network that achieves task decoupling through task orthogonality and class orthogonality constraints, while incorporating a progressive fine-tuning strategy to mitigate overfitting risks. Wang et al. [23] proposed the SNIDA method, which enhances sample diversity through nonlinear semantic decoupling and data augmentation. Zhu et al. [24] proposed the FSNA method, which exploits data information through neighborhood information adaptation and full-attention mechanisms. In comparison, our SM-FSOD model enhances the representational capacity for few-shot category information by directly screening query features from global information to augment support features. Guo et al. [25] proposed the DP-DDCL method, which constructs discriminative prototypes through dual-decoupled contrastive learning. Inspired by this approach, our method achieves query information guidance at a deeper information level via a second-order meta-learning strategy, thereby reducing the reliance on additional knowledge. While transfer learning frameworks are straightforward to train, they have inherent limitations: first, as a typical fine-tuning approach, they require retraining with novel class samples, compromising real-time performance; second, fine-tuning with extremely limited samples fails to capture complete feature distributions, making them prone to overfitting and degraded generalization capabilities.

Current research indicates that meta-learning methods have been widely applied to the construction of object detection models [26,27,28,29] to develop few-shot object detectors with stronger generalization capabilities. These approaches are predominantly built upon the two-stage object detection framework Faster R-CNN. The core idea of meta-learning is to construct few-shot tasks on base classes, where the model is trained with tasks consisting of query data and support data. This enables the model to learn the ability of utilizing semantically rich support information to guide the processing of few-shot query instances. This approach enables rapid model parameter updates with limited samples, achieving fast generalization to new tasks without requiring full model retraining, as illustrated in Figure 1A. Fan et al. [26] proposed FSODv2, which enhances the local representation capacity of support information through multi-scale support inputs and an improved FPN for feature matching calibration. In contrast, our approach innovatively incorporates the Transformer architecture and second-order meta-learning strategies, effectively mitigating the over-reliance on local features while strengthening the global semantic guidance capabilities of support prototypes towards query information. Han et al. [27] proposed a variational feature aggregation approach that reduces confusion between base and novel classes through class-agnostic aggregation and a variational Autoencoder-constructed meta-learner. Wang et al. [28] proposed a fine-grained prototype distillation method that enhances the few-shot detection performance through a meta-learner constructed via fine-grained feature aggregation. In contrast, our approach employs a second-order meta-learning strategy to strengthen the global semantic representation capabilities of support prototypes at the feature level, thereby providing more reliable supporting information for feature aggregation. Gao et al. [29] addressed data imbalance in meta-learning frameworks through an asymmetric adaptation paradigm, whereas our method mitigates inter-class confusion caused by data imbalance by directly enhancing the global semantic representation capacity of support prototypes at the feature level via global information screening and parallel architectures.

3. SM-FSOD: A Meta-Learning Approach for Few-Shot Object Detection

In this section, we present the theoretical foundations of second-order meta-learning algorithms. Building upon this meta-learning framework, we introduce the overall architecture of our proposed Second-Order Meta-Learning for Few-Shot Object Detection (SM-FSOD) algorithm. Subsequently, we provide detailed descriptions of the two novel meta-learners incorporated into our approach, explaining their mechanisms in enabling few-shot category knowledge transfer. This systematic presentation comprehensively elucidates the algorithmic logic underlying our proposed solution.

3.1. Second-Order Meta-Learning Algorithms

Meta-learning algorithms applied to few-shot object detection tasks in natural scenarios are typically constructed only at the shallow feature level extracted by the backbone network, guiding the learning of few-shot query information under simple support feature prototypes. As shown in Part A of Figure 2, this generic first-order meta-learning algorithm exhibits several structural limitations when applied to PCB surface defect detection tasks.

Figure 2. Comparison of two meta-learning strategies: (A) general single-stage meta-learning strategy; (B) second-order meta-learning strategy for PCB surface defect detection.

The fundamental challenge in PCB defect detection lies in the characteristic manifestation of defects where inter-category variations appear as only subtle morphological–textural differences, while intra-category defects exhibit remarkable consistency across spatial locations and scales. This intrinsic property creates significant limitations in few-shot learning scenarios: prototype features generated through the simplistic random sampling of support instances inherently lack query-specific adaptability, leading to insufficient correlation between support prototypes and query features. Consequently, the conventional approach of single-level matching confined to shallow convolutional feature spaces proves inadequate in capturing the hierarchical semantic representations of few-shot PCB defects, ultimately failing to model the complex interdependencies between query and support samples in few-shot PCB defect detection tasks.

Furthermore, the high-density PCB circuit background introduces compounded interference issues. At the few-shot feature metric level, shallow feature-based matching mechanisms exhibit excessive sensitivity to background noise, resulting in low confidence scores for certain critical defect features. Meanwhile, conventional region proposal networks generate numerous low-quality candidate boxes, which amplify the class confidence of background noise. This dual interference mechanism makes it particularly challenging to capture fine-grained information about novel few-shot PCB defect foregrounds, ultimately leading to classification confusion between few-shot defect categories and background negative samples.

Figure 2 presents a comparative analysis of failure and success cases in meta-learning using class activation mapping (CAM) heatmaps. Part A displays failure cases of first-order meta-learning along with their corresponding support prototypes. The CAM heatmaps demonstrate that these prototypes fail to effectively capture distinctive features of the few-shot sample categories, leading to inaccurate feature representations and poor generalization to novel classes. In contrast, Part B illustrates successful cases of second-order meta-learning with their respective support prototypes. The accompanying CAM heatmaps reveal that second-order methods can effectively identify and leverage critical features to construct more discriminative prototypes, thereby enabling successful adaptation to new tasks. The side-by-side CAM visualization clearly demonstrates the significant advantage of second-order methods over first-order approaches in generating more effective and meaningful prototype representations for few-shot learning scenarios.

In order to alleviate the aforementioned issues, this paper designs a second-order meta-learning strategy, as illustrated in the second-order meta-learning strategy section of Figure 2. This second-order meta-learning strategy effectively mitigates bias caused by imbalanced sample distributions through its inherent learning mechanism, as shown in Algorithm 1 below:

Algorithm 1 Second-Order Meta-Learning Algorithm

1:: Phase 1: First-Order Meta-Learning (Adaptive Prototype Generation)
2:: $F_{s} = F (S)$ {Extract support set features}
3:: $F_{q} = F (Q)$ {Extract query set features}
4:: for each query feature $f_{q} \in F_{q}$ do
5:: ${\tilde{F}}_{s} = M_{1} (F_{s}, f_{q})$ {Feature modulation}
6:: $p = Φ ({\tilde{F}}_{s})$ {Prototype generation function}
7:: end for
8:: Phase 2: Second-Order Meta-Learning (Semantic Alignment)
9:: for each prototype–query pair $(p, f_{q})$ do
10:: $z = M_{2} (p, f_{q})$ {Semantic alignment}
11:: $\hat{y} = C (z)$ {Classification prediction}
12:: end for

In the first-order meta-learning phase, the features of query samples dynamically modulate the feature distribution of support samples via a meta-learner, generating more representative support prototypes. This process not only integrates the internal structural relationships of the support samples but also incorporates similarity metrics between query and support samples, enabling prototype generation to adaptively focus on the critical features of minority-class defects. In the second-order meta-learning phase, the refined support prototypes guide the deep semantic mining of query samples; another meta-learer aligns the support prototypes with the deep features of query samples at this stage, enhancing the distinction between defects and the background in a higher-dimensional semantic space, thereby significantly improving the recognition capabilities for rare defect samples. Furthermore, the strategy employs data augmentation methods such as cropping and random rotation on input samples to increase the diversity of defect samples.

The corresponding experimental results indicate that this two-stage meta-learning algorithm significantly improves the target detection algorithm’s ability to perceive small sample PCB defect features, thereby comprehensively enhancing the detection accuracy of the meta-learning algorithm for PCB defect queries.

3.2. Framework Overview of the SM-FSOD Algorithm

Based on the aforementioned second-order meta-learning idea, this paper constructs a few-shot object detection network based on the Detection Transformer algorithm, as illustrated in Figure 3. By incorporating the second-order meta-learning strategy, it fully establishes deep-level correlations between support samples and query samples, thereby significantly enhancing the algorithm’s performance in few-shot PCB surface defect detection tasks.

Figure 3. Overall architecture of the SM-FSOD few-shot object detection algorithm.

Within the overall framework, we employ ResNet-50 as the backbone network. This selection is primarily based on the following reasons. First, the widespread adoption of ResNet in the computer vision field ensures comparability with existing state-of-the-art (SOTA) methods, enabling a fair performance evaluation of our framework. Second, its residual structure provides a powerful representational capacity, allowing the effective extraction and generalization of discriminative features essential for few-shot learning tasks. Third, ResNet has been demonstrated to exhibit excellent compatibility with Transformer encoders, supplying high-semantic-level features suitable for encoder processing.

First, this paper designs a multi-order meta-enhanced prototype (MOMP) network, serving as a first-order meta-learner to achieve the dynamic generation of defect support prototypes. The core of this approach lies in employing multi-dimensional attention filtering to extract the holistic semantic information of few-shot query images; the MOMP network achieves the dynamic semantic aggregation of support prototypes. This process leverages the informative components of the query to guide the generation of support prototypes. Such generation not only focuses on the feature relationships among support samples but also compensates for information gaps caused by insufficient novel class samples through the guidance of global query information. As a result, the generated support prototypes exhibit greater representativeness and adaptability.

To further enhance the guidance of support prototypes for query features, this paper proposes a dual-prototype guided Transformer (DPGT) network based on the Transformer architecture, which serves as a second-order meta-learner. The key innovation lies in the self-attention module, which further aggregates multi-level information from the internal features of support samples to enhance the representation capabilities of support categories. Subsequently, at each encoding layer, support prototypes and query features interact in parallel, with bidirectional feature updating achieved through cross-attention mechanisms, thereby better capturing complex semantic relationships between support and query samples. The DPGT structure not only effectively utilizes support prototypes to guide the representation of query features but also models the similarities and differences between support and query samples in deep feature spaces, improving the algorithm’s precision in target localization and classification.

The second-order meta-learning-based object detection network introduces global information and deep-level feature interactions, making the generation of PCB surface defect prototypes more dynamic and flexible while significantly enhancing the feature representation capabilities for query samples of novel defect categories. Experimental results on two PCB surface defect datasets demonstrate that the network achieves substantial performance improvements across multiple few-shot object detection benchmarks, fully validating its effectiveness and generalizability.

Our SM-FSOD model employs the Hungarian matching loss function, which constructs a matching cost matrix and then determines the optimal assignment between the predicted and ground truth bounding boxes using the Hungarian algorithm. This approach not only considers class matching but also optimizes the localization accuracy through the L1 loss and generalized intersection over union (IoU) loss. The overall loss function is defined as follows.

First, let

y = {y_{i}}_{i = 1}^{N}

denote the set of ground truth objects, where each

y_{i} = (c_{i}, b_{i})

consists of a class label

c_{i}

and a bounding box

b_{i}

. Since the number of actual targets is usually much smaller than N, the set

y

is padded with “no-object” placeholders (⌀) to match the size of the predicted set

\hat{y} = {{\hat{y}}_{i}}_{i = 1}^{N}

. The goal is to find an optimal permutation

\hat{σ}

over the predicted set that minimizes the total matching cost with the ground truth set. This is formally defined as

\hat{σ} = \underset{σ \in ℘_{N}}{arg min} \sum_{i = 1}^{N} L_{match} (y_{i}, {\hat{y}}_{σ (i)})

(1)

The matching cost

L_{match}

for each pair is defined as

L_{match} (y_{i}, {\hat{y}}_{σ (i)}) = - 1_{{c_{i} \neq ⌀}} {\hat{p}}_{σ (i)} (c_{i}) + 1_{{c_{i} \neq ⌀}} L_{Box} (b_{i}, {\hat{b}}_{σ (i)})

(2)

This cost function consists of two parts: a classification confidence term

- {\hat{p}}_{σ (i)} (c_{i})

and a bounding box regression loss

L_{Box}

. Both terms are applied only when the ground truth target

y_{i}

is valid (i.e.,

c_{i} \neq ⌀

). When

c_{i} = ⌀

, the total matching cost becomes zero due to the indicator function, thereby avoiding penalization on background regions and reducing false positives’ influence. Once the optimal assignment

\hat{σ}

is determined, the overall loss function is computed as

L_{Hungarian} (y, \hat{y}) = \sum_{i = 1}^{N} [- log {\hat{p}}_{\hat{σ} (i)} (c_{i}) + 1_{{c_{i} \neq ⌀}} L_{Box} (b_{i}, {\hat{b}}_{\hat{σ} (i)})]

(3)

The bounding box loss

L_{Box}

combines the intersection over union (IoU) loss and the L1 distance as follows:

L_{Box} (b_{i}, {\hat{b}}_{\hat{σ} (i)}) = λ_{iou} L_{iou} (b_{i}, {\hat{b}}_{\hat{σ} (i)}) + λ_{L 1} {∥b_{i} - {\hat{b}}_{\hat{σ} (i)}∥}_{1}

(4)

Here,

λ_{iou}

and

λ_{L 1}

are hyperparameters controlling the trade-off between box overlap and coordinate accuracy. The IoU loss encourages better spatial alignment between the predicted and ground truth boxes, while the L1 loss ensures precise localization. To handle class imbalance—especially the abundance of background predictions—we assign a reduced loss weight to samples with

c_{i} = ⌀

when computing the final loss. Additionally, we empirically find that using raw predicted probabilities (instead of log-probabilities) in the matching cost function leads to better convergence and performance.

The specific architectures of the proposed MOMP network and DPGT network within the SM-FSOD framework will be introduced in the following two sections.

3.3. MOMP Block: Multi-Order Meta-Enhanced Prototype Network

Existing meta-learning strategies rely on single feature-level support prototype construction, which proves inadequate for PCB surface defect detection tasks where the intra-class similarity of target regions is high. This limitation hinders stable guidance for few-shot defect query image learning. To enhance the support prototype’s perception capabilities for few-shot defect queries, we propose a multi-order meta-enhanced prototype network (MOMP) as a meta-learner. Through base class data training, the MOMP learns to utilize global coupling information from query images to guide support prototype generation. This enables query-aware prototypes that maintain alignment with targets in query images, as illustrated in Figure 4.

Figure 4. The architecture of the multi-order meta-enhanced prototype network.

The MOMP consists of two key components: the query global coupling information generation module and the prototype-aware enhancement module. Specifically, the query features

X_{q} \in R^{B \times C \times H_{q} \times W_{q}}

(where B denotes the batch size, C represents the number of channels, and

H_{q} \times W_{q}

indicates the spatial dimensions) are first fused with support features

X_{s} \in R^{B \times C \times H_{s} \times W_{s}}

through a 2D convolutional coupling structure to generate a global attention mask

Ψ

, which is subsequently utilized for prototype-aware enhancement processing.

The 2D convolutional layers separately extract deep semantic features from both the query and support representations. A tensor dot product is then computed to establish spatial correlations between their high-level semantic features. These correlation scores are normalized via Softmax to enhance regions with strong query–support correspondences in the spatial domain. The resulting tensor, termed the global attention mask, is learnable and enables the generalization of prototype attention during meta-learning training. Subsequently, to capture global contextual information from query features, an average pooling operation is applied. This compresses the spatial dimensions of each channel in the query feature tensor into scalar values, producing a channel-wise global representation vector of the query

γ

, formulated as

\begin{matrix} γ & = \frac{1}{H_{q} \cdot W_{q}} \sum_{i = 1}^{H_{q}} \sum_{j = 1}^{W_{q}} X_{q} (i, j) \end{matrix}

(5)

Since average pooling preserves smooth spatial contextual information, it is better suited for defect regions with blurry boundaries. Moreover, when handling sparse defect patterns, average pooling avoids the potential gradient sparsity issues associated with max pooling. Thus, in our MOMP module, we adopt the strategy of adaptive average pooling combined with spatial-wise concatenation.

The support spatial mask construction employs the cosine distance metric (CDM) as the similarity measure between vectors. At the channel level, it performs pixel-wise comparison between the spatial support features and the query’s global information to identify regions of strong spatial correspondence between support features and query information. This support spatial mask (denoted as Condition) serves as the basis for subsequent prototype-aware enhancement. In the final prototype-aware enhancement stage, the derived global information mask and support spatial mask undergo pixel-wise multiplication to generate an attention-enhanced mask. This mask acts as a spatial increment, effectively strengthening the support prototypes through the integration of query global information.

\begin{matrix} C o n d i t i o n (i, j) & = C D M [γ, X_{s} (i, j)] = \frac{| | γ \cdot X_{s} (i, j) | |}{| | X_{s} (i, j) | | \cdot | | γ | |} \end{matrix}

(6)

\begin{matrix} \hat{X_{s}} & = X_{s} + ψ \circ C o n d i t i o n (i, j) \end{matrix}

(7)

We adopt cosine similarity primarily due to its amplitude invariance. In few-shot learning scenarios, the feature magnitudes across different episodes may vary significantly due to the non-stationary distribution of meta-testing tasks. By focusing on the angular direction of feature vectors rather than their absolute magnitudes, cosine similarity more robustly handles intra-class variation and feature scale imbalance.

Through this approach, the meta-learning framework achieves guidance for support prototype generation via extracted query information, enhancing the support features’ perception capabilities for few-shot PCB defect categories.

3.4. DPGT Block: Second-Order Meta-Learner with Dual-Prototype Guided Transformer

Conventional few-shot object detection algorithms typically construct support prototypes only from shallow feature semantics in their meta-learners to guide few-shot query learning. While such approaches can handle few-shot problems to some extent, the limited representational capacity of shallow features often captures only intra-class local consistency, failing to effectively model higher-order inter-class relationships. For PCB surface defect detection in few-shot settings, subtle distinctions between positive samples and complex background classes may not be distinguishable through shallow features alone. To address these limitations, we propose a second-order meta-learner with a dual-prototype guided Transformer (DPGT). The architecture is illustrated in Figure 5.

Figure 5. The architecture of the dual-prototype guided Transformer.

This algorithm is implemented through support category prototype computation and meta-learning training. Specifically, in the support branch, a reconstructed Transformer encoder integrates input features with bounding box information to generate support prototypes for each category. Let the support features be denoted as

X_{s} \in R^{B \times C \times H_{s} \times W_{s}}

, where B represents the number of support images. The deep semantic representation of support information is computed through a residual-connected self-attention mechanism, formulated as

\hat{F} = Softmax (\frac{Q_{s} * K_{s}^{⊤}}{\sqrt{d}}) V_{s} + \hat{F_{s}}

(8)

where:

\begin{matrix} Q_{s} & = W_{Q} \cdot F_{s} \end{matrix}

(9)

\begin{matrix} K_{s} & = W_{K} \cdot F_{s} \end{matrix}

(10)

\begin{matrix} V_{s} & = W_{V} \cdot F_{s} \end{matrix}

(11)

Here,

\hat{F_{s}}

denotes the input support features;

W_{Q}

,

W_{K}

, and

W_{V}

are the learnable projection matrices for queries, keys, and values, respectively; and d is the feature dimension scaling factor.

Subsequently, the foreground regions of the support features are extracted using the annotated positional information and encoded into corresponding category prototypes

C_{s}

. This operation, performed during the support feature processing stage, extracts the class-specific features from each support bounding box and generates the encoded prototypes of the support categories via a Sigmoid function. This effectively captures fine-grained details in the support features and produces discriminative category prototypes (represented by the vector set

C_{s}

), which are then used in the meta-learner to guide the learning of the query branch.

The query branch follows the original encoder structure, directly encoding the query features into deep semantic information vectors. Finally, a meta-learner is constructed between the query information and the support category prototype vectors. This enables task learning to guide the relevant parameters of few-shot query information learning through the category prototype vectors. To ensure the global significance of the features, the meta-learner retains the structure of attention computation. A learnable tensor W is introduced to construct a learnable reinforcement metric between the encoded support categories and queries, as described by the following formula:

V_{Q} = W_{V}^{Q} \cdot Q + Softmax (\frac{Q_{W_{Q}} {(K_{W_{K}})}^{T}}{\sqrt{d}}) \cdot (V_{W_{V}}^{S} S)

(12)

The final step involves reasoning with the coupled query vectors through a feed-forward neural network (FFN) layer:

P = FFN [V_{Q} + \hat{F}]

(13)

By modeling the meta-learner on deep semantic features, the Transformer query encoding can capture more complex contextual information and higher-order relationships between novel class foregrounds and complex backgrounds and adaptively learn support prototypes. This effectively enhances the accuracy of few-shot learning for PCB surface defects and the model’s category generalization capabilities. Additionally, it mitigates the confusion problem between few-shot category foregrounds and complex PCB backgrounds caused by the limited representation capacity of shallow features and the difficulty in capturing deep semantic information of categories.

4. Experiments and Analysis

In this section, we present experiments on few-shot subsets of two PCB defect datasets to train and evaluate the proposed few-shot object detection (FSOD) algorithm. The performance of SM-FSOD is benchmarked against that of several state-of-the-art FSOD methods to assess its detection capabilities for few-shot PCB defects. Additionally, we evaluate the model’s real-time performance and training stability, with visualizations demonstrating its practical application potential.

4.1. Description of the Dataset

We utilized two PCB defect datasets to train the proposed object detection algorithm and evaluate the detection accuracy of the presented SM-FSOD algorithm. These two datasets are the DsPCBSD+ dataset and the DeepPCB dataset.

The DsPCBSD+ dataset is a publicly available benchmark dataset developed specifically for printed circuit board (PCB) defect detection tasks, demonstrating significant research value in industrial inspection applications. This comprehensive dataset consists of 10,259 low-definition industrial field images with an average resolution of 226 × 226 pixels, captured using a multi-angle lighting system to ensure optimal coverage of nine critical PCB defect categories, namely SP (spur), SC (spurious copper), OP (open), MB (mouse bite), HB (hole breakout), CS (conductor scratch), CFO (conductor foreign object), BMFO (base material foreign object), and SH (short), as shown in Figure 6.

Figure 6. The nine foreground categories in the DsPCBSD+ dataset.

Each defect instance is meticulously annotated in standardized COCO-format JSON files, including precise bounding box coordinates, categorical labels, visibility indicators for occlusion cases, and three-tiered difficulty classifications. The dataset is strategically partitioned into 8192 training images, 2048 validation images, and 2606 test images, with the detailed defect size distribution showing 42.7% small targets (32 × 32 pixels), 38.1% medium targets (32 × 32–96 × 96 pixels), and 19.2% large targets (96 × 96 pixels).

In the field of printed circuit board (PCB) defect detection, compared to the DsPCBSD+ dataset, the DeepPCB dataset is designed to provide high-quality training data for deep learning models. The dataset consists of 1500 image pairs, each comprising a defect-free template image and a corresponding test image with defects. These images were captured using a linear-scan CCD camera with a resolution of 48 pixels per millimeter. The images were first cropped into 640 × 640-pixel sub-images and aligned using template matching techniques to minimize preprocessing effort. Subsequently, manual annotation was performed to meticulously label the locations and types of defects in each test image, covering six common types of PCB defects, as illustrated in Figure 7.

Figure 7. The six foreground categories in the DeepPCB dataset.

To conduct few-shot object detection experiments, the original dataset needs to be processed and divided into different splits. Each split consists of base classes and few-shot novel classes, where the base classes retain their original sample sizes, while the novel classes undergo few-shot processing. Specifically, the novel classes are adjusted to contain 5-shot, 10-shot, 30-shot, and 50-shot samples for few-shot experiments. For the DsPCBSD+ dataset, 5 classes are retained as base classes, while the remaining 4 classes are treated as novel classes. Similarly, for the DeepPCB dataset, 4 classes are kept as base classes, and the other 2 classes are designated as novel classes. The specific class assignments for different splits are presented in Table 1 and Table 2.

Table 1. Two few-shot grouping configurations for the DsPCBSD+ dataset.

Table 2. Three few-shot grouping configurations for the DeepPCB dataset.

4.2. Description of Experimental Environment and Parameters

The proposed algorithm was implemented based on PyTorch 1.7.1 and the Detectron2 framework, with all model training and accuracy testing conducted on a parallel computing platform equipped with NVIDIA L40 GPUs using the CUDA parallel computing framework version 12.0. During the model training phase, we employed the AdamW algorithm as the optimizer and implemented a learning rate warm-up period for the first 100 iterations to stabilize initial training. Specific training parameters were configured as follows: in the base training phase, the input batch size was set to 8, while, during the few-shot meta-learning phase, it was increased to 16 to facilitate rapid model generalization to novel classes.

For the two differently scaled datasets, we established distinct model optimization strategies. For base training on the DsPCBSD+ dataset, each input batch was processed as one training step, with a total of 60,000 optimization steps performed; correspondingly, the base class training on the DeepPCB dataset comprised 110,000 optimization steps to ensure coverage of all instances in the dataset. During the base training phase, the initial learning rate was uniformly set to 0.02 with a learning rate scheduling factor of 0.1. Specifically, for the DsPCBSD+ dataset, the learning rate was reduced by a factor of 10 at 40,000 and 50,000 iterations, while similar adjustments were made for the DeepPCB dataset at 80,000 and 100,000 iterations.

In the novel class meta-learning stage, the optimization process on the DsPCBSD+ dataset used an initial learning rate of 0.001 for 40,000 fine-tuning steps, with the learning rate reduced tenfold at the 36,000th optimization step. For the DeepPCB dataset, we adopted an initial learning rate of 0.005 for 80,000 optimization steps, reducing the learning rate at the 70,000th step to achieve optimal model performance. Upon completing training, the testing phase was conducted with the model configured to retain the top 1000 highest-confidence candidate boxes during evaluation for model inference. All testing experiments were performed using the same hardware configuration as in the training phase to ensure the reliability and consistency of the experimental results, with multiple runs executed to verify the reproducibility and statistical significance of the findings. The comprehensive implementation details, including specific versions of all dependencies and precise hardware specifications, were meticulously documented to facilitate experimental replication and validation by the research community.

4.3. Accuracy Comparison of Few-Shot Object Detection Methods

The evaluation method adopted in this paper follows the N-way K-shot few-shot object detection algorithm evaluation paradigm. Here, N represents the number of novel classes in the dataset—for the DsPCBSD+ dataset, N = 4, and, for the DeepPCB dataset, N = 2. K denotes the number of samples per novel class, with values set to 5, 10, 30, and 50. For instance, five-shot means that only five samples per novel class are provided to the model during the meta-learning phase for training. After completing meta-learning on the novel classes, the model weights are used for inference on the test set, and the results are recorded. Due to the limited number of novel class samples, the experimental outcomes may vary depending on the selection of these samples. To mitigate this variability, the evaluation of the SM-FSOD algorithm is averaged over 10 different novel class sample selections. In the few-shot partitioning experiments, we adopted a stratified random sampling strategy to ensure balanced class distribution, with a fixed random seed (seed = 5) to guarantee partitioning reproducibility. Each shot configuration (5-/10-/30-/50-shot) was independently subjected to 10 randomized trials, with the support and query sets regenerated in each trial. The final results are reported in the form of the mean and standard deviation.

We selected several representative object detection algorithms as control groups for precision comparison experiments. Among them, the Two-Stage Fine-Tuning Approach (TFA) algorithm proposed by Wang et al. [12] is a transfer learning method based on fine-tuning that achieves generalization to few-shot categories by fine-tuning the last layer of the detector on novel classes. To comprehensively evaluate algorithm performance, it is necessary to compare SM-FSOD with several meta-learning algorithms. These include the Few-Shot Object Detection via Feature Reweighting (FSRW) algorithm proposed by Kang et al. [30], the Meta-Learning-Based Region-Based Convolutional Network (Meta-RCNN) algorithm by Yan et al. [31], and the Meta-FRCNN algorithm by Han et al. [13]. These methods all implement interaction between few-shot query information and support information by designing single-stage meta-learners at the feature level within the Faster-RCNN framework.

Table 3 presents the evaluation results on the DsPCBSD+ dataset, where two different novel class splits (split 1 and split 2) are defined. Using an intersection over union (IoU) threshold of 0.5 (AP50), the ratio of correctly predicted bounding boxes to the total number of test samples is calculated. This metric assesses the model’s generalization ability for few-shot class detection and further measures the detection accuracy of the proposed algorithm.

Table 3. Precision comparison experiments on the DsPCBSD+ using the AP50 as the evaluation metric.

As shown in Table 3, the proposed SM-FSOD algorithm achieves promising evaluation results on few-shot classes compared to the current mainstream few-shot object detection methods. Through the performance comparison with several meta-learning-based few-shot object detection algorithms, it can be observed that the proposed SM-FSOD algorithm achieves accuracy improvements ranging from 1.7% to 5.1% in terms of the AP50 metric when compared to the general Meta-FRCNN algorithm. By averaging the precision across different scenarios, the proposed SM-FSOD method demonstrates superior performance over the current state-of-the-art approaches in comprehensive evaluation settings.

To comprehensively evaluate the performance advantages of the proposed algorithm and validate its industrial applicability, we conducted systematic precision comparison experiments on the DeepPCB dataset. As a benchmark dataset in the field of PCB defect detection, it covers six major types of common industrial defects.

Table 4 presents the evaluation results of the SM-FSOD algorithm on the more challenging DeepPCB dataset, assessing the mean detection accuracy for novel classes under S = 5, S = 10, S = 30, and S = 50 settings. For a more comprehensive evaluation on the DeepPCB dataset, we additionally adopted the intersection over union (IoU) threshold of 0.5 (AP50) to calculate the average precision for novel classes. Compared with mainstream transfer learning and meta-learning based methods, the SM-FSOD algorithm achieves superior detection results. Specifically, when compared to the Meta-FRCNN meta-learning algorithm, our method achieves improvements of 9.8 percentage points in the 30-shot setting and 14.0 percentage points in the 5-shot setting. The results demonstrate that SM-FSOD achieves optimal performance across different benchmarks, indicating that the SM-FSOD model exhibits excellent generalization capabilities in few-shot learning tasks.

Table 4. Precision comparison experiments on DeepPCB using the AP50 as the evaluation metric.

4.4. Algorithm Ablation Study

To thoroughly analyze the contributions of each module to the overall performance of the SM-FSOD algorithm, we conducted ablation experiments on the DsPCBSD+ dataset under the split 2 data partitioning scheme, covering scenarios from 5-shot to 50-shot. The results are presented in Table 5.

Table 5. Performance comparison on DeepPCB dataset (AP50/AP75 metrics).

The first column displays the detection results using a traditional meta-learning algorithm with the Detection Transformer. These results indicate that, without modules specifically improved for few-shot tasks, the original algorithm exhibits some overfitting when the number of novel class samples is limited, leading to relatively low average precision.

The second column shows the performance when employing an MOMP network as a single-stage meta-learner. The introduction of this improved meta-learner brings varying degrees of performance enhancement across different few-shot scenarios, particularly achieving a 5.1% improvement in the 10-shot setting compared to the baseline. This demonstrates that, compared to constructing the meta-learner on shallow features, our approach of building the meta-learner on high-level semantic features can more effectively organize support prototypes and guide the learning of few-shot novel classes. The third column represents the results obtained by further introducing the DPGT network on top of the single-stage meta-learning framework, thereby constructing a complete second-order meta-learning object detection algorithm. The results show that the second-order meta-learning framework, enhanced by the GPIB module, achieves superior performance while keeping the parameter increase within an acceptable range.

Further analysis of the ablation experimental results reveals that, when the sample size is sufficient, traditional baseline methods can already achieve relatively saturated feature representations, as evidenced by the baseline reaching 52.5 mAP under the 50-shot setting. In contrast, the second-order meta-learning strategy introduced in our method demonstrates greater advantages in extreme few-shot scenarios, achieving a 6.7 mAP improvement over the baseline in the 10-shot setting, while the enhancement narrows to 4.6 mAP in the 50-shot setting. This confirms that the proposed method is more suitable for industrial inspection scenarios with extremely scarce data.

4.5. Visualization Results of Object Detection

We conduct a visual analysis of the object detection algorithm to gain deeper insights into its behavior and performance. Specifically, we first visualize the prediction results—including detection box locations, category labels, and confidence distributions—to intuitively demonstrate the algorithm’s detection effectiveness across different scenarios, as shown in Figure 8. The detailed analysis is as follows.

Figure 8. The ablation study visualizations under the first few-shot category configuration on the DsPCBSD+ dataset.

Through the predictive visualization of bounding boxes and corresponding confidence scores for defect targets, we observe the following issues with the unimproved algorithm: firstly, the model exhibits lower confidence in few-shot novel-class instances, accompanied by missed detections of novel-class objects. This indicates that the baseline algorithm suffers from overfitting to few-shot novel-class targets. However, after integrating the improved MOMP module into the baseline framework, we observe a significant reduction in missed detections for few-shot novel-class targets, along with a more balanced confidence distribution across novel class instances. Furthermore, in Scenario 2, the localization accuracy for novel class foreground targets improves substantially, with the predictions becoming more concentrated on target instances and significantly reduced background interference within bounding boxes. Our analysis reveals that the model’s missed detections primarily stem from the limited number of support samples for few-shot novel classes. This scarcity prevents the algorithm from adequately learning discriminative features for these categories, resulting in ambiguous representations in the feature space. The introduction of the MOMP module significantly mitigates this issue.

Figure 8 presents the detection results after incorporating the DPGT model to establish the complete second-order meta-learning framework. The results demonstrate that the DPGT approach achieves further suppression of both false background detections and missed detections. This verifies that the GPIB-enhanced first-order meta-learner resolves the classification confusion between background negative samples and novel class samples by leveraging global query information to pre-guide prototype construction, thereby enabling the object detection algorithm to attain superior generalization capabilities for few-shot categories.

These visualization results empirically validate the reliability of the ablation studies and further elucidate the contributions of individual algorithmic modules. The synergistic interplay among multiple modules not only enhances the overall detection performance but also plays a pivotal role in target discrimination under complex scenarios. Furthermore, this paper incorporates attention heatmap visualizations to reveal the algorithm’s focus regions during feature extraction and target localization, as depicted in Figure 9 and Figure 10.

Figure 9. The heatmap visualization experiments conducted on the DsPCBSD+ dataset.

Figure 10. The heatmap visualization experiments conducted on the DeepPCB dataset.

Specifically, as shown in Figure 9, we visualize the heatmaps on the second few-shot split of the DsPCBSD+ dataset. The first and second columns demonstrate that, for SC category targets, without the DGPT module, the model’s attention is scattered and it fails to concentrate on valid foreground regions. After incorporating the DGPT module, the model achieves effective focus on both the morphology and locations of foreground targets. The third column reveals that, when multiple small targets coexist in a single scene, without DGPT, the model cannot adequately attend to all foreground objects. With DGPT integration, the model successfully focuses on most defective foreground regions.

Meanwhile, we also conducted heatmap visualizations on the first few-shot setting of the DeepPCB dataset to evaluate the model’s performance when multiple categories coexist. The results demonstrate that, across six different background configurations, the incorporation of both the MOMP and DGPT modules enhances the model’s attention to few-shot category foregrounds while maintaining its generalization capabilities for base category foregrounds. This validates that the model can sustain robust performance in relatively complex scenarios.

The gradient-weighted class activation mapping (Grad-CAM) heatmaps intuitively display the algorithm’s attention to different feature regions. The results indicate that, during the dynamic integration of support and query information, the algorithm’s response to few-shot novel class foregrounds is significantly enhanced. This confirms that, under the guidance of global features, the support prototypes’ perception of novel class objects is effectively strengthened, providing more reliable support information for the subsequent meta-learner stage. Meanwhile, the heatmap intuitively displays the image regions relied upon by the model for decision-making by calculating gradient flow. If the heatmap concentrates on the target defect structures rather than background noise, it indicates the model’s ability to effectively filter out interference from the background during detection. From another perspective, when the input image contains complex background noise, if the heatmap can still accurately highlight the actual defects (instead of falsely activating noise regions), it demonstrates the model’s strong resistance to interference.

To deeply analyze the model’s attention mechanism and error patterns, Figure 11 presents paired visualization results of heatmaps with false positive/false negative cases. Specifically targeting two typical defect types, the figure demonstrates the differences in the model’s attention distribution between successful detection and failure cases.

Figure 11. Qualitative analysis of missed detections and background false alarms using heatmaps on the DsPCBSD+ dataset.

Through comparative observation, it can be seen that, in CFO detection, the model with only MOMP added accurately focuses on the inverted color areas at the center of stains but may miss detections when multiple similar stains appear simultaneously; in SMFO detection, when the background becomes complex, the model with only MOMP ignores the PCB texture background and concentrates on edge background areas, indicating that a single MOMP may exhibit false activation in periodic background patterns. These visualization results demonstrate that, while the MOMP module can effectively distinguish between defect features and background patterns in most cases, its attention mechanism can still be affected by periodic background patterns. Further visual analysis shows that, in samples containing defect-mimicking periodic backgrounds, the proportion of cases where the module excessively focuses on the background remains low. This confirms that our proposed second-order constraint mechanism effectively suppresses background response and reduces the false activation rate of background patterns.

4.6. Real-Time Performance and Stability Analysis of the Algorithm

To evaluate whether our proposed object detection algorithm can meet real-time requirements in practical applications and assess the reasonableness of its GPU memory consumption, we employ two critical metrics, time efficiency and memory usage, to validate the feasibility of the SM-FSOD few-shot object detection algorithm. Specifically, this section presents a comparative analysis between the proposed algorithm and several few-shot object detection approaches in terms of the frame rate (FPS) and resource consumption (GiB), evaluating their performance characteristics under varying computational resource conditions. To ensure fair comparisons, all experiments were performed under identical hardware configurations. The detailed experimental results are presented in Table 6.

Table 6. Real-time performance and computational resource analysis of few-shot object detection algorithms.

The experimental results demonstrate that the proposed SM-FSOD method achieves an optimal balance between efficiency and performance. Compared to Meta-FRCNN and other approaches, it delivers superior detection accuracy with only a marginal increase of 1.1 GiB in GPU memory usage and a maximum speed reduction of 1.2 FPS. This approach successfully resolves the accuracy–speed trade-off, maintaining computational efficiency while enhancing the detection performance, thereby providing a practical solution that balances both precision and efficiency for real-world applications.

According to the general technical requirements for production lines, the operating speed of mainstream PCB inspection lines typically ranges from 3 to 4.5 m/min. Taking a common PCB board size (length × width: 45 cm × 60 cm) as an example, at a line speed of 4 m/min, each board takes approximately 6.75 s to pass through the inspection area. Our method achieves a processing speed of 4.3 FPS, which exceeds the minimum number of image samples required for full-coverage inspection. Therefore, based on the relationship between the production line speed, board size, and processing capacity, a theoretical processing speed of 4–5 FPS fully meets the real-time requirements of general precision-oriented PCB inspection lines.

4.7. Analysis of Computational Cost and Convergence Optimization Strategies

To comprehensively evaluate both the efficiency and optimization behavior of our proposed framework, we conduct detailed analyses covering the computational cost, convergence acceleration, and class imbalance mitigation.

We first provide a layer-wise breakdown of model complexity, including the backbone, MOMP, DPGT, matching module, and detection heads. For each component, we report the number of parameters and memory consumption. As shown in Table 7, the MOMP and DPGT introduce modest overhead compared to the backbone, while still enabling substantial accuracy gains.

Table 7. Layer-wise computational cost analysis of key modules.

To assess the real-time feasibility in practical applications, we benchmark the end-to-end inference throughput on both a mid-range GPU (NVIDIA RTX 3060) and a CPU (Intel i7-12700). We set the batch size to 1, use float32 precision, and fix the top-K proposals to 300. The time breakdown reveals that DPGT accounts for 21% of the inference time and MOMP for 13%, confirming deployment viability under moderate latency constraints. Regarding optimization dynamics, we clarify that the Hungarian loss employs weights

λ_{cls} = 1.0

,

λ_{L 1} = 0.5

, balancing classification and localization objectives. To address class imbalance, especially the prevalence of background samples, we downweight the no-object class by a factor of

1 / 10

during loss computation.

Figure 12 illustrates the training curves for both the classification and box regression losses. Compared to the baseline, our model achieves faster convergence within the first 50 epochs, suggesting that the prototype-guided matching and hierarchical optimization not only improve the final AP but also stabilize the training dynamics. This supports our claim that the proposed guidance mechanism accelerates convergence rather than merely optimizing endpoint metrics.

Figure 12. Comparison of the loss curves between the baseline model and the complete SM-FSOD model.

4.8. Further Evaluation of Robustness and Practical Metrics

To provide a more comprehensive evaluation of the proposed method’s practical utility and robustness, we conducted additional experiments focusing on finer-grained detection metrics and sensitivity to few-shot configurations. In addition to the commonly used AP50 metric, we also included the AP75 to assess the method’s precision under stricter localization criteria. Furthermore, we report the log-average miss rate (LAMR) and F1-scores at fixed confidence thresholds of 0.5 and 0.75. These metrics are critical in real-world PCB inspection scenarios, where accurate localization and threshold stability are essential due to the high demands of PCB defect detection. As shown in Table 8.

Table 8. Evaluation of detection performance under various metrics (DeepPCB-30 Shot-Split 2). Higher AP and F1-scores indicate better performance, while a lower LAMR reflects fewer missed detections. ↑ and ↓ represent an increase and decrease in values compared to the baseline.

To investigate the robustness of the proposed method under different few-shot settings, we visualize the AP75 distribution of novel classes from split 2 of the DeepPCB dataset across three different configurations using violin plots. The results show that the method maintains relatively stable performance across categories. As shown in Figure 13.

Figure 13. Using violin plots to illustrate the stability of different novel classes under the AP75 metric in split 2 of the DeepPCB dataset.

5. Conclusions

The experimental results demonstrate that the proposed SM-FSOD model exhibits significant advantages in industrial PCB defect detection tasks. Built upon an innovative second-order meta-learning framework, the model successfully addresses critical challenges faced by conventional methods in industrial settings, including sample scarcity and target ambiguity, through its unique defect-aware mechanism. Compared to existing techniques, the proposed approach achieves notable improvements in detection accuracy, particularly demonstrating enhanced recognition capabilities for low-contrast defects. Analytical validation confirms the effectiveness of the model’s feature enhancement mechanism in accurately capturing key characteristics of subtle defects. This research provides novel insights for few-shot learning applications in industrial quality inspection, offering a domain-customized meta-learning strategy that effectively resolves the bottleneck of data acquisition difficulties in real-world production environments while demonstrating excellent engineering practicality and generalization potential.

Although our proposed SM-FSOD method performs well in few-shot PCB defect detection, it still has certain limitations. Firstly, the model’s performance is somewhat dependent on the quality of support samples; missed detections may still occur with incomplete defect samples featuring low-resolution foregrounds or severe occlusion. Secondly, while the second-order optimization mechanism improves the accuracy, it also increases the model’s computational complexity to some extent, which may pose deployment challenges on resource-constrained embedded devices. Future research will focus on developing efficient compression and acceleration strategies for few-shot models.

Author Contributions

Conceptualization: X.S.; Methodology: X.S., Z.L. and Q.H.; Software: X.S. and Z.L.; Validation: X.S., Z.L., Q.H., M.Y. and C.L.; Formal analysis: X.S., Z.L. and Q.H.; Investigation: X.S. and Z.L.; Resources: X.S. and Q.H.; Data curation: X.S. and Q.H.; Writing—original draft: X.S., Z.L., Q.H. and C.L.; Writing—review and editing: X.S., Z.L., Q.H. and M.Y.; Visualization: X.S. and M.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the following grants: National Natural Science Foundation of China (Grant No. 62072489); National Natural Science Foundation of China (Grant No. 61873293); Science and Technology Research Project of Henan Province (Grant No. 252102211108); National Natural Science Foundation of China (Grant No. 62301623).

Data Availability Statement

The DeepPCB dataset used in this study is publicly available in the GitHub repository: https://github.com/tangsanli5201/DeepPCB/tree/master/PCBData (accessed on 1 July 2025). The DsPCBSD+ dataset is publicly available in the GitHub repository: https://github.com/kikopapa/PCB_Defect_Detection/tree/main/dataset (accessed on 19 July 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Huang, G.; Laradji, I.; Vázquez, D.; Lacoste-Julien, S.; Rodríguez, P. A Survey of Self-Supervised and Few-Shot Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 4071–4089. [Google Scholar] [CrossRef] [PubMed]
Wei, J.; Lan, Y.; Tang, T.; Liu, T. A Survey on Transfer Reinforcement Learning. In Proceedings of the 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE), Shanghai, China, 21–23 March 2025; pp. 2511–2518. [Google Scholar] [CrossRef]
Zhang, P.; Liu, C.; Chang, X.; Li, Y.; Li, M. Metric-based Meta-Learning Model for Few-Shot PolSAR Image Terrain Classification. In Proceedings of the 2021 CIE International Conference on Radar (Radar), Haikou, China, 15–19 December 2021; pp. 2529–2533. [Google Scholar] [CrossRef]
Gaikwad, M.; Doke, A. Survey on Meta Learning Algorithms for Few Shot Learning. In Proceedings of the 2022 6th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 25–27 May 2022; pp. 1876–1879. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision; Springer International Publishing: Zurich, Switzerland, 2014; pp. 740–755. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Huang, X.; Li, W. A Novel PCB Defect Detection Network Based on the Improved YOLOv8 with Fusion of Hybrid Attention Transformer and Bidirectional Feature Pyramid Network. In Proceedings of the 2024 4th International Conference on Artificial Intelligence, Robotics, and Communication (ICAIRC), Xiamen, China, 27–29 December 2024; pp. 207–211. [Google Scholar] [CrossRef]
Li, M.; Yu, Z.; Fang, L.; Meng, Y.; Zhang, T. GCP-YOLO Detection Algorithm for PCB Defects. In Proceedings of the 2024 4th International Symposium on Computer Technology and Information Science (ISCTIS), Xi’an, China, 12–14 July 2024; pp. 492–495. [Google Scholar] [CrossRef]
Hsiao, C.-H.; Su, H.-C.; Wang, Y.-T.; Hsu, M.-J.; Hsu, C.-C. ResNet-SE-CBAM Siamese Networks for Few-Shot and Imbalanced PCB Defect Classification. Sensors 2025, 25, 4233. [Google Scholar] [CrossRef] [PubMed]
Tang, S.; He, F.; Huang, X.; Yang, J. Online PCB Defect Detector On A New PCB Defect Dataset. arXiv 2019, arXiv:1902.06197. [Google Scholar] [CrossRef]
Lv, S.; Ouyang, B.; Deng, Z.; Liang, T.; Jiang, S.; Zhang, K.; Chen, J.; Li, Z. A dataset for deep learning based detection of printed circuit board surface defect. Sci. Data 2024, 11, 811. [Google Scholar] [CrossRef]
Wang, X.; Huang, T.E.; Darrell, T.; Gonzalez, J.E.; Yu, F. Frustratingly simple few-shot object detection. In Proceedings of the International Conference on Machine Learning, Online, 12–18 July 2020; pp. 9919–9928. [Google Scholar]
Han, G.; He, Y.; Huang, S.; Ma, J.; Chang, S.F. Query Adaptive Few Shot Object Detection with Heterogeneous Graph Convolutional Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 3243–3252. [Google Scholar] [CrossRef]
Kim, G.; Jung, H.-G.; Lee, S.-W. Spatial reasoning for few-shot object detection. Pattern Recognit. 2021, 120, 108118. [Google Scholar] [CrossRef]
Zhang, R.; Yan, T.; Zhang, J. Vision-Based Structural Adhesive Detection for Electronic Components on PCBs. Electronics 2025, 14, 2045. [Google Scholar] [CrossRef]
Du, B.; Wan, F.; Lei, G.; Xu, L.; Xu, C.; Xiong, Y. YOLO-MBBi: PCB Surface Defect Detection Method Based on Enhanced YOLOv5. Electronics 2023, 12, 2821. [Google Scholar] [CrossRef]
Xu, Y.; Huo, H. DSASPP: Depthwise Separable Atrous Spatial Pyramid Pooling for PCB Surface Defect Detection. Electronics 2024, 13, 1490. [Google Scholar] [CrossRef]
Xie, Z.; Zou, X. MFAD-RTDETR: A Multi-Frequency Aggregate Diffusion Feature Flow Composite Model for Printed Circuit Board Defect Detection. Electronics 2024, 13, 3557. [Google Scholar] [CrossRef]
Yan, H.; Zhang, H.; Gao, F.; Wu, H.; Tang, S. Research on Deep Learning Model Enhancements for PCB Surface Defect Detection. Electronics 2024, 13, 4626. [Google Scholar] [CrossRef]
Ai, Y.; Ye, T. Surface Defect Detection Algorithm for PCB Based on Improved YOLOv8. In Proceedings of the 2024 8th International Conference on Electrical, Mechanical and Computer Engineering (ICEMCE), Xi’an, China, 25–27 October 2024; pp. 1421–1425. [Google Scholar] [CrossRef]
Hayat, A.; Chen, M.; Zhou, K. Attention GhostUNet++: Enhanced Segmentation of Adipose Tissue and Liver in CT Images. In Proceedings of the 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Copenhagen, Denmark, 14–17 July 2025. [Google Scholar]
Wang, Z.; Gao, Y.; Liu, Q.; Wang, Y. Semantic Enhanced Few-Shot Object Detection. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 27–30 October 2024; pp. 575–581. [Google Scholar] [CrossRef]
Wang, B.; Yu, D. Orthogonal Progressive Network for Few-shot Object Detection. Expert Syst. Appl. 2025, 264, 125905. [Google Scholar] [CrossRef]
Zhu, J.; Wang, Q.; Dong, X.; Ruan, W.; Chen, H.; Lei, L.; Hao, G. FSNA: Few-Shot Object Detection via Neighborhood Information Adaption and All Attention. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 7121–7134. [Google Scholar] [CrossRef]
Guo, Y.; Ma, L.; Luo, X.; Xie, S. DP-DDCL: A discriminative prototype with dual decoupled contrast learning method for few-shot object detection. Knowl.-Based Syst. 2024, 297, 111964. [Google Scholar] [CrossRef]
Fan, Q.; Zhuo, W.; Tang, C.K.; Tai, Y.W. FSODv2: A Deep Calibrated Few Shot Object Detection Network. Int. J. Comput. Vis. 2024, 132, 3566–3585. [Google Scholar] [CrossRef]
Han, J.; Ren, Y.; Ding, J.; Yan, K.; Xia, G.S. Few-Shot Object Detection via Variational Feature Aggregation. AAAI Conf. Artif. Intell. 2023, 37, 755–763. [Google Scholar] [CrossRef]
Wang, Z.; Yang, B.; Yue, H.; Ma, Z. Fine-Grained Prototypes Distillation for Few-Shot Object Detection. AAAI Conf. Artif. Intell. 2024, 38, 5859–5866. [Google Scholar] [CrossRef]
Gao, Y.; Lin, K.Y.; Yan, J.; Wang, Y.; Zheng, W.S. AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 3261–3271. [Google Scholar] [CrossRef]
Kang, B.; Liu, Z.; Wang, X.; Yu, F.; Feng, J.; Darrell, T. Few-Shot Object Detection via Feature Reweighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8419–8428. [Google Scholar] [CrossRef]
Yan, X.; Chen, Z.; Xu, A.; Wang, X.; Liang, X.; Lin, L. Meta R-CNN: Towards General Solver for Instance-Level Low-Shot Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9576–9585. [Google Scholar] [CrossRef]
Qiao, L.; Zhao, Y.; Li, Z.; Qiu, X.; Wu, J.; Zhang, C. DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 8681–8690. [Google Scholar]
Ma, J.; Niu, Y.; Xu, J.; Huang, S.; Han, G.; Chang, S. DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 3208–3218. [Google Scholar]

Figure 1. Two few-shot detection methods: (A) meta-learning-based few-shot object detection algorithm and (B) Transfer learning-based few-shot object detection algorithm.

Figure 2. Comparison of two meta-learning strategies: (A) general single-stage meta-learning strategy; (B) second-order meta-learning strategy for PCB surface defect detection.

Figure 3. Overall architecture of the SM-FSOD few-shot object detection algorithm.

Figure 4. The architecture of the multi-order meta-enhanced prototype network.

Figure 5. The architecture of the dual-prototype guided Transformer.

Figure 6. The nine foreground categories in the DsPCBSD+ dataset.

Figure 7. The six foreground categories in the DeepPCB dataset.

Figure 8. The ablation study visualizations under the first few-shot category configuration on the DsPCBSD+ dataset.

Figure 9. The heatmap visualization experiments conducted on the DsPCBSD+ dataset.

Figure 10. The heatmap visualization experiments conducted on the DeepPCB dataset.

Figure 11. Qualitative analysis of missed detections and background false alarms using heatmaps on the DsPCBSD+ dataset.

Figure 12. Comparison of the loss curves between the baseline model and the complete SM-FSOD model.

Figure 13. Using violin plots to illustrate the stability of different novel classes under the AP75 metric in split 2 of the DeepPCB dataset.

Table 1. Two few-shot grouping configurations for the DsPCBSD+ dataset.

Category	Split 1	Split 2
Base Classes	[MB, HB, CS, CFO, BMFO]	[OP, MB, HB, CS, SH]
Novel Classes	[SP, SC, OP, SH]	[SC, CFO, BMFO, SP]

Table 2. Three few-shot grouping configurations for the DeepPCB dataset.

Category	Split 1	Split 2	Split 3
Base Classes	[SH, SP, CO, MB]	[PH, OP, CO, MB]	[PH, OP, SH, SP]
Novel Classes	[PH, OP]	[SH, SP]	[CO, MB]

Table 3. Precision comparison experiments on the DsPCBSD+ using the AP50 as the evaluation metric.

Category Type	Few-Shot Split 1				Few-Shot Split 2
Category Type	S = 5	S = 10	S = 30	S = 50	S = 5	S = 10	S = 30	S = 50
TFA/fc [12]	9.2	13.9	14.6	20.3	11.1	16.8	22.9	29.4
TFA/cos [12]	9.8	15.2	14.2	21.5	13.7	16.9	27.3	30.8
FSRW [30]	8.5	12.8	22.8	24.3	13.8	19.3	28.2	30.4
Meta-RCNN [31]	12.7	16.5	22.8	26.1	28.5	33.1	39.2	44.8
Meta-FRCNN [13]	14.8	21.5	27.2	31.6	31.8	33.9	40.2	47.9
VFA [27]	16.2	20.9	28.3	29.8	30.2	33.8	39.6	42.8
FSOR-SR [14]	16.8	20.1	26.7	30.8	29.1	33.8	40.6	44.3
DeFRCN [32]	13.7	18.4	27.9	30.5	29.3	34.5	40.7	42.9
Digeo [33]	11.8	16.8	28.2	29.1	26.7	30.1	37.4	42.5
SM-FSDO	15.1 ± 0.7%	20.5 ± 0.9%	22.5 ± 0.7%	30.2 ± 0.9%	36.9 ± 0.5%	42.6 ± 1.1%	42.9 ± 0.9%	52.5 ± 0.9%

Table 4. Precision comparison experiments on DeepPCB using the AP50 as the evaluation metric.

Model	TFA/fc [12]	TFA/cos [12]	FSRW [30]	Meta-RCNN [31]	Meta-FRCNN [13]	SM-FSOD
Few-Shot Split 1
5-shot	11.8	12.9	22.7	28.2	33.8	29.0 ± 0.9
10-shot	16.8	15.2	27.0	29.7	37.2	38.2 ± 1.1
30-shot	20.8	21.1	33.2	35.2	44.2	43.7 ± 0.5
50-shot	22.7	23.8	40.2	38.8	46.7	45.5 ± 0.8
Few-Shot Split 2
5-shot	13.7	13.9	21.7	30.2	27.2	44.2 ± 0.5
10-shot	16.2	16.8	23.6	34.9	32.4	48.5 ± 0.2
30-shot	24.2	25.0	31.1	40.0	39.1	49.8 ± 0.8
50-shot	25.7	25.9	35.9	42.5	44.0	50.1 ± 1.2
Few Shot Split 3
5-shot	12.9	13.2	25.0	38.2	42.8	41.0 ± 0.7
10-shot	17.8	17.9	29.7	40.1	44.7	45.4 ± 0.4
30-shot	25.9	26.3	37.5	44.0	50.0	51.2 ± 0.8
50-shot	26.1	26.9	41.8	44.2	49.3	52.4 ± 0.7

Table 5. Performance comparison on DeepPCB dataset (AP50/AP75 metrics).

MOMP	✗	✗	✔	✔
DPGT	✗	✔	✗	✔
Ablation experiments under the AP50 metric
5-shot	30.2 ± 0.7%	33.8 ± 0.5%	34.7 ± 0.6%	36.9 ± 0.5%
10-shot	37.5 ± 0.6%	38.1 ± 1.1%	39.6 ± 0.7%	42.6 ± 1.1%
30-shot	38.3 ± 0.6%	40.3 ± 0.8%	41.9 ± 0.4%	42.9 ± 0.9%
50-shot	47.9 ± 0.9%	48.6 ± 0.4%	48.2 ± 0.5%	52.5 ± 0.9%
Ablation experiments under the AP75 metric
5-shot	11.3 ± 0.6%	13.8 ± 0.8%	14.2 ± 0.3%	14.5 ± 0.8%
10-shot	15.9 ± 0.5%	16.2 ± 0.6%	16.1 ± 0.3%	16.9 ± 1.1%
30-shot	18.3 ± 0.5%	19.6 ± 0.3%	19.2 ± 0.7%	20.0 ± 1.4%
50-shot	21.9 ± 0.4%	21.8 ± 0.9%	22.0 ± 0.4%	22.2 ± 0.8%

Table 6. Real-time performance and computational resource analysis of few-shot object detection algorithms.

Model	FPS	Gib
TFA	7.49	22.1
Meta-RCNN	4.63	26.4
Meta-FRCNN	5.97	29.5
SM-FSOD	4.82	30.6

Table 7. Layer-wise computational cost analysis of key modules.

Module	Parameters (MB)	Memory (Gib)
Backbone	332.9	26.57
MOMP Module	12.2	0.918
DPGT Module	38.2	2.75
Matching Module	0.011	-
Detection Heads	5.1	0.4
Total	388.2	30.6

Table 8. Evaluation of detection performance under various metrics (DeepPCB-30 Shot-Split 2). Higher AP and F1-scores indicate better performance, while a lower LAMR reflects fewer missed detections. ↑ and ↓ represent an increase and decrease in values compared to the baseline.

Method	AP50 ↑	AP75 ↑	LAMR ↓	F1@0.5 ↑	F1@0.75 ↑
BASELINE	38.3	18.3	0.32	0.48	0.32
SM-FSOD	42.9	20.0	0.25	0.54	0.38

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

SM-FSOD: A Second-Order Meta-Learning Algorithm for Few-Shot PCB Defect Object Detection

Abstract

1. Introduction

2. Related Works

2.1. Object Detection Algorithm on PCB Defects

2.2. Few-Shot Object Detection (FSOD) Algorithm

3. SM-FSOD: A Meta-Learning Approach for Few-Shot Object Detection

3.1. Second-Order Meta-Learning Algorithms

3.2. Framework Overview of the SM-FSOD Algorithm

3.3. MOMP Block: Multi-Order Meta-Enhanced Prototype Network

3.4. DPGT Block: Second-Order Meta-Learner with Dual-Prototype Guided Transformer

4. Experiments and Analysis

4.1. Description of the Dataset

4.2. Description of Experimental Environment and Parameters

4.3. Accuracy Comparison of Few-Shot Object Detection Methods

4.4. Algorithm Ablation Study

4.5. Visualization Results of Object Detection

4.6. Real-Time Performance and Stability Analysis of the Algorithm

4.7. Analysis of Computational Cost and Convergence Optimization Strategies

4.8. Further Evaluation of Robustness and Practical Metrics

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics