1. Introduction
Electronic products have become indispensable in modern society and are widely used in homes, factories, communication systems, transportation, and medical care. Printed Circuit Boards (PCBs) are the core carriers of electronic components and electrical signal transmission in these products. Their manufacturing quality directly affects the safety, stability, and reliability of electronic devices. In practical production, defects such as short circuits, open circuits, missing holes, mouse bites, and spurs may cause unstable electrical connections, signal transmission failures, or even complete device malfunction. Such failures can lead to economic losses in consumer electronics and may cause more serious risks in industrial automation and medical equipment. Therefore, accurate and efficient PCB defect detection is of great importance for improving product quality, reducing manufacturing costs, and ensuring the safe and reliable operation of electronic products. Traditional PCB defect detection methods mainly rely on manual visual inspection or image processing-based algorithms, such as template matching, edge detection, and threshold segmentation. These methods typically depend on human experience or handcrafted features, which results in low detection efficiency, poor robustness, and insufficient adaptability to complex backgrounds [
1]. With the development of machine learning methods, detection approaches based on algorithms like Support Vector Machines (SVM) and Random Forests have improved detection performance to some extent, but they are still limited by feature representation capabilities and struggle to meet the demands of complex industrial scenarios [
2]. PCBs exhibit strong structural symmetry and regular texture distribution, which can both assist defect localization and bring interference when defects break the symmetric pattern, making accurate detection more challenging.
In recent years, with the development of deep learning technology, object detection methods based on convolutional neural networks (CNNs) have made significant progress in the field of industrial vision. In particular, the YOLO series of algorithms has been widely used in PCB defect detection due to its end-to-end structure and superior real-time performance [
3]. As the latest representative of this series, YOLOv8 achieves a good balance between detection accuracy and speed by introducing an anchor-free mechanism and a decoupled detection head. However, in practical PCB defect detection tasks, there are still challenges such as small object detection difficulties, severe interference from complex backgrounds, and insufficient feature representation capability [
4].
Addressing the aforementioned issues, existing studies have primarily focused on improving YOLO models in two directions.
One approach enhances model performance by optimizing convolutional structures and feature fusion strategies. For example, Sun et al. introduced the CBAM attention mechanism into YOLOv8, combining channel and spatial attention to significantly enhance the model’s feature representation capability for small-scale defects, achieving notable performance improvements on the PCB dataset [
5]. Wei et al. proposed the PCB-YOLO model, which effectively improved detection accuracy in complex backgrounds by incorporating multi-scale feature fusion and attention mechanisms [
6]. Moreover, Li et al. proposed the SCF-YOLO model, which, by designing a lightweight structure and feature fusion module, increased detection speed and accuracy while reducing model complexity [
7]. Similarly, SEPDNet achieved performance improvements with fewer parameters by introducing an FPN structure to optimize the feature fusion path [
8]. Additionally, in other visual tasks, research has also shown that optimizing feature enhancement and fusion structures can improve model performance. For instance, Wang et al. proposed SCP-DETR, which introduces a kernel-based representation learning strategy and a stacked feature pyramid enhancement module to achieve efficient multi-scale feature fusion and representation, significantly improving the detection accuracy of tiny PCB defects [
9]. Although the aforementioned methods improve model performance to some extent, their core still relies on convolution operations, with feature modeling primarily confined to local receptive fields, which remains insufficient for capturing long-distance dependencies in complex backgrounds.
Another category of methods introduces the Transformer architecture (such as ViT, Swin Transformer, and DETR), utilizing the self-attention mechanism to enhance the model’s global modeling capabilities. Unlike traditional convolutional neural networks, which are limited by local receptive fields, Transformers can capture long-range dependencies through multi-head self-attention mechanisms, thereby improving feature representation capabilities in complex scenarios. For example, ViT-YOLO introduces a multi-head self-attention module into the YOLO backbone network, enhancing global context modeling capabilities by constructing an MHSA-Darknet structure and combining it with BiFPN to achieve multi-scale feature fusion, significantly improving detection performance [
10]. In addition, Swin Transformer-based improvements are also widely applied in object detection tasks. For instance, ST-YOLOA embeds the Swin Transformer into the YOLO backbone network, achieving collaborative modeling of local and global information through the hierarchical window attention mechanism, and effectively improving detection accuracy in complex backgrounds by integrating attention mechanisms with the feature pyramid structure [
11]. The above methods introduce Transformer modules into the YOLO framework, enhancing the model’s global modeling capabilities to a certain extent. Meanwhile, another line of research focuses on improving Transformers at the detection framework level. For example, An and Zhang proposed LPViT, a Transformer-based model for PCB image classification and defect detection, which integrates mask patch prediction and label smoothing to enhance model robustness and feature representation [
12]. These methods are more inclined towards a Transformer-dominated detection paradigm in terms of structural design, further expanding the design space for object detection models. However, Transformer architectures typically rely on self-attention mechanisms, whose computational complexity grows quadratically with feature scale, leading to large model parameters and high computational costs, which still pose certain limitations in industrial scenarios where resources are limited or real-time performance is required.
At the same time, there are also other improvement methods. Recent studies, such as MAS-YOLO, have shown that introducing Adaptive Hierarchical Feature Integration (AHFIN) and Median Enhancement Attention Mechanism (MECS) into the YOLO architecture can further enhance the model’s ability to perceive tiny defects in PCBs. This aligns with the approach in this paper, which employs the Mamba architecture and PAN++ structure to strengthen global context and detail feedback, fully demonstrating the necessity of optimizing feature flow and attention distribution for industrial defect detection tasks [
13].
In response to the characteristics of PCB defects, namely “numerous small targets and high real-time requirements,” some studies have begun to explore detection methods that balance lightweight design and high accuracy. For example, Hou et al. proposed the EffNet-PCB model, which improves detection accuracy while reducing the number of parameters by optimizing the feature fusion module and the detection head structure [
14]. Li et al. achieved multi-scale PCB defect detection through pruning and lightweight network design, significantly enhancing detection performance while ensuring real-time capability [
15]. Furthermore, Cao et al. proposed a PCB defect detection method based on multi-scale feature enhancement and a background suppression mechanism. By constructing a multi-scale contextual feature enhancement module combined with a background decoupling attention mechanism, this approach effectively suppresses background interference and enhances the feature representation of micro-defects in complex industrial environments, while significantly improving detection accuracy without compromising real-time performance [
16]. Despite the improvements in detection performance brought by these methods, there remain the following issues: on the one hand, most methods lack effective global modeling capability; on the other hand, the introduction of complex structures often increases computational costs, affecting the real-time performance of the models.
In recent years, state–space models (SSMs) have achieved groundbreaking progress in visual tasks. Among them, the Mamba model has attracted widespread attention due to its linear-time complexity and excellent long-sequence modeling capabilities. Compared to the Transformer architecture, Mamba can significantly reduce computational complexity while maintaining global modeling capabilities, making it more suitable for real-time detection tasks. Related studies have attempted to introduce Mamba into object detection frameworks, such as the YOLO-Mamba model, which balances long-range dependency modeling and efficient computation by integrating SSM [
17]. However, research on applying Mamba to PCB defect detection tasks remains relatively limited, especially in terms of small target detection and multi-scale feature fusion, where there is still considerable room for improvement.
Based on the above analysis, this paper proposes a PCB defect detection method called C2f-FPN-PAN++-Mamba Improved YOLOv8. This method optimizes YOLOv8 on two levels: feature extraction and feature fusion. On one hand, the C2f-Mamba module is introduced into the Backbone and Neck, embedding the state–space model (Mamba) into the C2f structure to enhance the model’s global modeling capability and long-range dependency representation. On the other hand, the original feature fusion structure is improved by introducing a multi-scale feature fusion strategy that combines FPN and PAN++, strengthening information interaction between features at different scales and significantly improving the model’s ability to detect small target defects.
The main contributions of this paper are summarized as follows:
Unlike existing YOLO-Mamba methods that simply insert Mamba blocks into the backbone or neck, the proposed module embeds Mamba into the C2f structure and adopts a channel-split dual-branch design. This design integrates convolution-based local perception with Mamba-based global modeling, thereby enhancing the representation of tiny PCB defects while preserving the lightweight characteristics of YOLOv8n.
Different from conventional FPN-PAN structures or PAN++ designs originally used for text detection, the proposed FPN-PAN++ is adapted to PCB defect detection by retaining bidirectional cross-scale aggregation and path enhancement. It strengthens the complementary fusion of high-level semantic information and low-level spatial details, improving the detection robustness for small and multi-scale defects.
The proposed framework jointly enhances local–global feature representation and bidirectional multi-scale feature fusion, addressing the limitation that existing methods usually improve either global modeling or feature fusion alone. This collaborative design is especially suitable for PCB defects with small size, dense textures, and complex backgrounds.
Experimental results show that the model achieves 98.5% precision, 98.4% recall, 98.8% mAP@0.5, and 62.5% mAP@0.5:0.95 with only 3.86 M parameters, demonstrating a favorable balance between detection performance and computational efficiency for industrial PCB defect inspection.
The proposed method is compared with baseline YOLO models, attention-based YOLO variants, Transformer-based YOLO variants, and single-module improved models, verifying the effectiveness and complementarity of C2f-Mamba and FPN-PAN++ for high-precision PCB defect detection.
3. Materials and Methods
3.1. YOLOv8n Model
This study selected the advanced object detection model YOLOv8n as the baseline model for comparative experiments. As the latest version in the YOLO series, YOLOv8 adopts an end-to-end, single-stage object detection framework, mainly consisting of three components: Backbone, Neck, and Head. The Backbone is used to extract multi-level features from input images, the Neck achieves multi-scale feature fusion through a feature pyramid structure (such as FPN and PAN), and the Head is responsible for outputting the predicted class and location of the objects.
In the Backbone and Neck, YOLOv8 introduces the C2f module as the core feature extraction unit. The C2f module uses a branching structure and feature concatenation mechanism to enhance feature representation while maintaining computational efficiency. However, this module is still fundamentally based on convolution operations, primarily relying on local receptive fields for feature extraction, and has limited ability to model long-range dependencies.
As the lightweight version of the YOLOv8 series, YOLOv8n’s core advantage lies in extreme speed and minimal resource consumption while maintaining practical accuracy. It has only 3.2 million parameters, a small model size, low GPU and memory usage, and a frame rate far higher than larger models in the v8 series, such as v8s and v8m. Compared with earlier lightweight models like YOLOv5n [
38], v8n adopts a more advanced C2f structure, decoupled detection head, and anchor-free design, achieving higher detection accuracy and more stable performance on small objects with similar speed and more efficient post-processing. Compared with other lightweight detection algorithms like SSD and EfficientDet-Lite, it features a simpler training process and a more complete deployment ecosystem, showing the best overall performance in deployment scenarios with limited computing power, low latency requirements, and high cost-effectiveness [
39,
40].
The original YOLOv8n has only 3.20 M parameters, 8.7 GFLOPs, and runs at 186 FPS on a single RTX 4090 GPU with 640 × 640 input, showing excellent efficiency for industrial deployment.
3.2. Improve the Overall Design of Our Model Architecture
In response to the characteristics of PCB defect detection, which include a large number of small targets, complex backgrounds, and high requirements for real-time performance, this paper improves YOLOv8 from three aspects: feature extraction capability, feature fusion efficiency, and model lightweighting, constructing the C2f-FPN-PAN++-Mamba Improved YOLOv8 model as shown in
Figure 1.
First, in the feature extraction stage, we introduce the C2f-Mamba module to replace part of the original C2f structure, as shown in
Figure 2. Different from the original C2f, our C2f-Mamba employs a channel-split dual-branch parallel design, which consists of a convolution branch for capturing local fine-grained spatial details and a Mamba branch for efficiently modeling long-range global dependencies. By integrating the state–space model, it achieves collaborative modeling of local features and global context, thereby enhancing the model’s feature representation ability in complex scenarios. Meanwhile, the design of the C2f-Mamba module is determined according to the feature distribution characteristics of PCB defects and the demand for lightweight real-time detection. The characteristic distribution of PCB defects is manifested as follows: a high proportion of small-target defects, minute defect sizes, dense textures, complex backgrounds with strong interference, and defects that exhibit multi-scale features, low contrast, and irregular shapes. Industrial online inspection requires models to possess characteristics of lightweight design, low computational overhead, and high inference speed to accommodate edge device deployment and real-time detection demands. Therefore, the model must enhance the global feature modeling capability for small defects while strictly controlling the number of parameters and computational costs. The module is deliberately deployed in the middle and deep stages of the backbone network instead of shallow layers, since shallow features mainly contain low-level edge and texture information with weak semantics, which provides limited gains for global modeling. In contrast, features in the middle and deep stages possess richer and more complete semantic information about defect structures and global layouts, making them more suitable for capturing long-range dependencies between defects and backgrounds. The dual-branch structure fuses local convolution features and global state–space features through channel concatenation, which retains fine-grained defect details while enhancing long-range dependency modeling without introducing complex weighting or attention operations. Such a configuration brings significant performance improvement with only a small increase in parameters and calculation, achieving a favorable trade-off between accuracy and efficiency that better meets the requirements of real-time industrial PCB defect detection compared with heavy Transformer-based structures.
Second, in the feature fusion stage, an improved FPN-PAN++ structure is adopted to strengthen information flow between multi-scale features, improving the utilization efficiency of features at different scales, and especially enhancing the detection capability for small target defects.
Finally, in the overall structural design, both detection performance and computational cost are considered. High-complexity Transformer structures are avoided, and the Mamba module is used to achieve near-global modeling capability, thereby maintaining high inference efficiency while ensuring accuracy improvement.
In industrial inspection scenarios, the model not only needs to have high detection accuracy but also must meet real-time and deployment efficiency requirements. Therefore, this paper focuses on the issue of lightweight design during the model development process. The statement is motivated by the online nature of industrial PCB inspection. In practical production lines, defect images are continuously captured and must be processed within a limited cycle time; otherwise, the detection system may become a bottleneck and reduce production efficiency. Meanwhile, industrial inspection systems are often deployed on edge or embedded platforms, where memory and computational resources are limited. Therefore, merely improving detection accuracy is insufficient. The model must also maintain low computational complexity and high inference speed.
On one hand, compared with Transformer structures, the Mamba model adopts a state–space modeling method with linear time complexity, which significantly reduces computational complexity and memory consumption while achieving global information modeling, making it more suitable for resource-constrained industrial environments.
On the other hand, the improvements made in this study do not significantly increase the network depth or width, but instead enhance feature utilization efficiency through structural optimizations, such as improvements to the C2f module and optimization of feature fusion paths, achieving performance gains with minimal parameter increases.
Furthermore, the improved FPN-PAN++ structure, through the design of efficient information transmission paths, reduces redundant computation, improves feature fusion efficiency, and helps enhance inference speed while maintaining detection accuracy.
Compared with existing YOLO-Mamba methods, the proposed model has three distinct differences: Existing methods only insert Mamba block into backbone, while this paper embeds Mamba into C2f to form C2f-Mamba for local–global fusion. This paper uses FPN-PAN++ to enhance bidirectional feature fusion, which is not adopted in previous YOLO-Mamba for defect detection. The whole model keeps lightweight and high speed for industrial deployment, while most Transformer or heavy Mamba models sacrifice real-time performance.
Overall, the proposed model achieves a good balance between accuracy and efficiency and possesses significant practical value for engineering applications.
3.3. C2f-Mamba Module
3.3.1. Principles and Limitations of the Native C2f Module
The C2f module is an important basic unit used for feature extraction in YOLOv8. Its design concept originates from the CSP (Cross Stage Partial) structure. Through the feature splitting and fusion mechanism, it enhances feature representation capability while ensuring computational efficiency. Specifically, the C2f module first splits the input features along the channel dimension. A portion of the features is directly connected across layers, while the other portion is processed through several convolutional layers for feature extraction. Finally, the features from all branches are concatenated and fused, achieving feature reuse and information enhancement.
This structure can effectively alleviate the gradient vanishing problem and, to some extent, reduce model parameters while improving computational efficiency. As a result, it is widely used in both the Backbone and Neck of YOLOv8. However, based on the practical requirements of PCB defect detection, the native C2f module still has certain limitations.
The C2f module fundamentally relies on convolution operations for feature extraction, with its receptive field mainly concentrated in local regions, limiting its ability to model long-range dependencies. In PCB images, defects often have strong correlations with their surrounding structures, and relying solely on local features may easily lead to misjudgments in complex backgrounds.
For small target defects (such as micro-cuts, pinholes, etc.), their feature information is relatively weak and can easily be gradually diluted or lost during multiple convolution layers. The C2f module lacks an explicit global information enhancement mechanism, making it difficult to effectively compensate for this problem.
Therefore, it is necessary to introduce mechanisms capable of modeling global information while maintaining the efficiency of the C2f module, in order to enhance the feature representation capability of the model in complex scenarios.
3.3.2. Basic Principles of the Mamba State–Space Model
In recent years, state–space models (SSMs) have made significant progress in the field of sequence modeling. Among them, the Mamba model, as a novel SSM structure, is able to model long sequences while maintaining computational efficiency. The basic form of a state–space model can be expressed as:
Here, represents the input sequence, denotes the hidden state of the system, is the output, and A, B, C, D are learnable parameter matrices. This model achieves dynamic modeling of sequence information through state recursion.
On this basis, the Mamba model introduces a Selective Scan mechanism to efficiently model the input sequence. The Mamba model introduces a Selective Scan mechanism to efficiently model the input sequence. The key idea of Selective Scan is to make the parameters of the state–space model input-dependent. In traditional linear time-invariant state–space models, the state transition and projection parameters are fixed for all input tokens, which limits the model’s ability to adaptively select useful information from different spatial positions. In contrast, Mamba dynamically generates several key parameters, such as the time-step parameter, the input projection parameter, and the output projection parameter, according to the current input token through learnable linear projections.
Therefore, the state update process in Mamba is no longer controlled by a fixed parameter set but is adaptively adjusted according to the input content at each position. This enables the model to selectively preserve useful information, update important contextual representations, and suppress irrelevant background interference. Such a selective mechanism is especially suitable for PCB defect detection, because tiny defects are often embedded in dense circuit textures and can easily be confused with background patterns.
In visual feature modeling, the two-dimensional feature map is first flattened into a one-dimensional spatial sequence. The Selective Scan operation is then performed along this sequence to capture long-range spatial dependencies with linear complexity. After global dependency modeling, the output sequence is reshaped back to the original two-dimensional feature map. In this way, Mamba can achieve efficient global context modeling while maintaining lower computational complexity than Transformer-based self-attention mechanisms.
Its core advantage lies in reducing the computational complexity of the traditional self-attention mechanism from to linear complexity , Mamba can achieve linear complexity because it replaces the self-attention mechanism of the Transformer with recursive updates in the state space. Instead of computing pairwise similarities across all positions in the sequence, it performs serial updates of the hidden states through selective scanning, combined with hardware-friendly parallel scan algorithms. This ensures that the computational load scales strictly linearly with the sequence length, thereby avoiding quadratic complexity and achieving higher computational efficiency when processing long sequence data.
In visual tasks, the input 2D feature map
usually needs to be reconstructed into a one-dimensional sequence form:
and fed into the Mamba module for global modeling. In this way, the model can establish long-range dependencies in the spatial dimension, thereby capturing richer contextual information.
Compared with Transformer models, Mamba not only enables global modeling but also significantly reduces computational complexity and memory consumption, making it more suitable for industrial inspection tasks with high real-time requirements.
Mamba maintains efficiency in long-sequence modeling through two core mechanisms. First, it uses a selective scan mechanism that implements state transition with input-dependent parameters in a single forward pass, avoiding the repeated dot-product operations of self-attention. Second, its computational complexity is reduced from O(n2) of Transformer self-attention to linear O(n), where n denotes sequence length. When applied to visual features, the 2D feature map is flattened into a 1D spatial sequence; Mamba models long-range dependencies along this sequence efficiently. This enables global context modeling at a cost comparable to convolutions, making it suitable for lightweight, real-time PCB defect detection.
Mamba can achieve efficient global context modeling while maintaining lower computational complexity for the following reasons:
(1) The computational complexity of Transformer self-attention is O(n2), whereas Mamba is linear O(n), resulting in a slower increase in computation with sequence length;
(2) Mamba does not require explicit construction of the attention weight matrix, allowing for more efficient memory access and better hardware execution compatibility;
(3) Mamba completes long-range dependency modeling through a selective state–space model, covering the global receptive field in a single forward pass, achieving global modeling capabilities comparable to Transformer, but with significantly lower computational and memory costs.
3.3.3. C2f-Mamba Module Construction
To address the shortcomings of the original C2f module in global information modeling, this paper proposes an improved structure integrating the Mamba mechanism—the C2f-Mamba module. Traditional convolutional neural networks excel at extracting local texture and edge features but are limited by their local receptive fields, making them unable to effectively model the long-range dependencies between defects and the global circuit layout in PCB images. Conversely, pure Mamba models possess powerful global modeling capabilities but tend to lose fine-grained spatial details, which are critical for detecting tiny PCB defects. Therefore, instead of simply inserting Mamba blocks at the end of C2f modules, we propose a local–global dual-branch parallel structure that simultaneously preserves the local perception advantages of convolution and the global modeling advantages of Mamba, enabling deep synergy between the two. The integration of the state–space model (Mamba) into the C2f module is motivated by the complementary strengths of convolution and SSM. Convolution excels at capturing local spatial details such as defect edges, textures, and fine-grained structures, which are critical for small PCB defect localization. Meanwhile, Mamba provides efficient long-range dependency modeling with linear complexity, enabling global context awareness to suppress background interference and capture structural relationships across the PCB. By fusing these two paths within a unified C2f-Mamba block, the model achieves collaborative representation: local features preserve precise defect details, while global features supply semantic consistency. This joint modeling significantly enhances feature representation in complex backgrounds with dense textures and weak defects, which directly addresses the insufficient global modeling of the original C2f module. This module achieves collaborative modeling of local and global features by introducing a state–space modeling path. The specific structural design is as follows:
First, the input features are represented as:
Then divided along the channel dimension:
Among them,
and
represent different channel sub-features respectively.
For the local branch, the original C2f structure is retained for convolutional feature extraction:
This branch is mainly responsible for capturing local texture and edge detail information. For the global branch, the feature map is first flattened into a 1D token sequence and then fed into the Mamba module for long-range dependency modeling. After sequence processing, the output sequence is reshaped back to the original spatial dimensions to preserve the spatial correspondence of features before feature fusion:
This process realizes the modeling of long-distance dependencies through a state–space modeling mechanism, thereby enhancing the ability to express global contextual information. Finally, local features are fused with global features:
Among them,
denotes the channel concatenation operation, and the subsequent convolutions are used to further integrate feature information.
To address the challenges of strong background interference and difficulty in capturing tiny targets in PCB defect detection, this paper constructs the C2f-Mamba module as shown in
Figure 3. Logically, this module achieves deep integration of local and global information by splitting the feature flow into dual-path processing via a Split operation: the ‘Direct access’ branch preserves the original local spatial features, while the other branch is input into a Mamba Layer Group composed of multiple SSM (Selective State–Space Model) units. The dual-branch structure is designed based on two critical feature types for PCB defect detection: 1. Local convolution features: retain high-resolution spatial details, edge information, texture patterns, and fine-grained defect morphology, which are essential for accurate bounding box regression. 2. Global state–space features: capture long-range spatial dependencies, global layout consistency, and inter-region contextual relationships, which help distinguish real defects from similar background textures. Channel concatenation is chosen for fusion because it maintains the full dimensionality of both feature sets without compression, allowing the subsequent convolution layer to adaptively learn complementary weights. This preserves both fine-grained localization cues and global semantic guidance, which is vital for detecting tiny, low-contrast defects in complex layouts. Using the selective scanning mechanism of the state–space model, it extracts long-range global dependencies, thereby significantly enhancing the model’s feature representation and recognition capability for tiny defects in complex backgrounds. Compared with traditional Transformer architectures, C2f-Mamba leverages the linear computational characteristics of SSM, obtaining a wide receptive field while keeping the complexity at the O(N) level, demonstrating very high inference efficiency. Experiments show that introducing this module can effectively improve the model’s sensitivity to tiny PCB damages and significantly reduce false positives and missed detections while ensuring real-time performance.
To clarify the collaborative modeling mechanism, the proposed C2f-Mamba module integrates local feature extraction and global context modeling through a parallel dual-branch structure. Specifically, the input feature map is first divided along the channel dimension into two complementary feature subsets. The local branch preserves the convolutional operation of the original C2f module to capture fine-grained spatial details, such as defect edges, textures, and local shape patterns. In parallel, the global branch reshapes the 2D feature map into a 1D token sequence and feeds it into the Mamba state–space module to model long-range spatial dependencies and global contextual relationships. After global sequence modeling, the feature sequence is reshaped back to the original spatial resolution. Finally, the local convolutional features and the global Mamba features are concatenated along the channel dimension and further fused by a convolution layer. In this way, local details and global semantic dependencies are not processed independently but are jointly integrated within the same C2f-Mamba block, enabling the network to retain tiny defect details while suppressing background interference through global contextual awareness.
3.4. FPN + PAN++ Feature Fusion Structure
In PCB defect detection, defects usually appear as small-scale, low-contrast, and irregular targets embedded in dense circuit textures. Therefore, relying only on single-level features is insufficient. Shallow features contain rich spatial details and edge information, which are important for locating tiny defects, but they lack strong semantic discrimination and are easily disturbed by background textures. Deep features contain stronger semantic information, but their spatial resolution is reduced after repeated downsampling, which may cause small defect details to be weakened or lost. Therefore, an effective feature fusion structure should not only transfer high-level semantic information to shallow layers, but also feed low-level detailed information back to deeper layers.
The Feature Pyramid Network (FPN) propagates high-level semantic information to lower layers through a top-down pathway, which is formally proposed and validated on the COCO detection benchmark [
41]. COCO (Common Objects in Context) is a large-scale, universal, and widely accepted standard benchmark for object detection. It contains complex real-world scenes, large-scale variations, and a high proportion of small objects, and has become the de facto benchmark for validating multi-scale feature fusion methods such as FPN. Its core idea is to progressively upsample high-level feature maps and fuse them with corresponding low-level features via lateral connections, thereby enhancing the semantic representation capability of the lower-level features. However, this one-way information flow is insufficient for PCB defect detection because fine-grained defect details from shallow layers cannot be fully propagated to high-level semantic features. Although the conventional PAN structure introduces a bottom-up path, its feature interaction is still relatively limited when handling small defects with large-scale variation and complex backgrounds. To address this problem, this paper adopts a lightweight FPN-PAN++ structure to strengthen bidirectional cross-scale information flow. Specifically, the FPN path transfers global semantic information from deep layers to shallow layers, improving the semantic representation of small defects, while the PAN++ path further aggregates shallow spatial details and feeds them back to deeper layers, enhancing localization accuracy. Through this bidirectional feature circulation, high-level semantic information and low-level spatial details are repeatedly complemented and fused, thereby reducing the semantic gap between different feature levels and improving the detection robustness for multi-scale PCB defects.
Moreover, only the core bidirectional aggregation and feature pyramid enhancement components of PAN++ are retained, while task-specific text detection branches are removed. This lightweight adaptation avoids unnecessary computational overhead and makes the structure more suitable for real-time PCB defect detection. Therefore, the adoption of FPN-PAN++ is motivated by the need to achieve stronger multi-scale feature interaction, better small-defect localization, and improved robustness under complex PCB backgrounds.
3.4.1. FPN Feature Fusion Mechanism
In object detection tasks, there are significant differences in the feature distributions of objects at different scales. For PCB defect detection, small-scale defects (such as micro-circuit breaks or pinholes) are often more apparent in shallow features, while deep features contain richer semantic information. Therefore, how to effectively integrate multi-scale features becomes the key to improving detection performance.
The Feature Pyramid Network (FPN) propagates high-level semantic information to lower layers through a top-down pathway. Its core idea is to progressively upsample high-level feature maps and fuse them with corresponding low-level features, thereby enhancing the semantic representation capability of the lower-level features.
The basic process can be expressed as:
where
represents the i-th layer feature output from the Backbone,
is the fused feature map, and
denotes the upsampling operation.
The top-down propagation works as follows: 1. The deepest feature map (e.g., C5) carries strong global semantics. 2. It is upsampled to match the spatial size of the shallower layer (C4). 3. The upsampled high-level features are fused with C4 via convolution or concatenation. 4. This process repeats from deeper to shallower layers (C5 → C4 → C3). In this way, low-level features obtain high-level semantic guidance, improving the discrimination of small defects while retaining high-resolution spatial details.
Through this architecture, FPN can introduce high-level semantic information while retaining high-resolution features, thereby improving the model’s ability to detect small objects. However, FPN only contains a top-down information flow and lacks a feature enhancement pathway from low to high layers, which still results in insufficient utilization of information in complex scenarios.
3.4.2. Advantages and Structure of PAN++
To further enhance feature fusion capability, this paper introduces the PAN++ structure based on FPN to achieve bidirectional enhancement of multi-scale features. The Path Aggregation Network (PAN) adds a bottom-up path to feedback low-level detailed information to high-level features, thereby compensating for the unidirectional information flow deficiency of FPN. The original PAN++ framework includes multiple text-specific components: 1. Mask prediction branch: outputs pixel-level text segmentation masks to locate arbitrary-shaped text regions. 2. Text kernel segmentation: predicts a compact text core to separate adjacent text instances and suppress background. 3. Boundary regression branch: predicts text contour offsets to refine irregular text boundaries. These branches are designed for end-to-end text detection but introduce unnecessary computation for PCB defect detection, which only requires bounding-box prediction. We therefore remove all text-specific heads and retain only the bidirectional cross-scale feature aggregation and pyramid enhancement modules. This paper does not directly employ the complete PAN++ framework, but carries out lightweight adaptation and transformation for PCB defect object detection task: Only retain the core bidirectional cross-scale feature aggregation and Feature Pyramid Enhancement Module (FPEM) to strengthen multi-scale feature fusion and information flow. Completely discard the Mask branch, text kernel segmentation and boundary regression modules that are dedicated to text detection, so as to avoid extra computation and task inconsistency. Connect the modified lightweight PAN++ with FPN to construct a bidirectional feature fusion architecture, which only outputs multi-scale feature maps for object detection and sends them to the decoupled detection head (classification + box regression) of YOLOv8.
Its bottom-up feature fusion process can be expressed as:
where
represents the fused feature, and
denotes the downsampling operation.
By combining FPN and PAN++, this paper constructs a bidirectional feature fusion architecture as shown in
Figure 4. Subfigure (a) shows the top-down feature pyramid (FPN) path, which focuses on enhancing the transmission of high-level semantic information; subfigure (b) illustrates the bottom-up path aggregation (PAN++) process, effectively feeding back detailed information through the introduction of additional paths.
This structure demonstrates significant advantages in PCB defect detection tasks: It strengthens the feature representation of small targets, with high-resolution low-level details (such as the edges of fine wires) fully retained and fed back to higher levels via the enhancement path in
Figure 4b; meanwhile, it enhances multi-scale information interaction, with features from different layers deeply aggregated and fused in subfigure (c), enabling the model to handle defects of vastly varying sizes in PCB production; finally, it improves robustness in complex scenarios, reducing interference from background noise in the recognition of minor defects.
Furthermore, the fused features are directed to different prediction branches: subfigure (d) shows the decoupled prediction heads, handling class classification (Class) and bounding box regression (Box) separately, ensuring accurate localization. By optimizing this bidirectional path structure, PAN++ significantly enhances feature representation capability while successfully avoiding excessive redundant computations, effectively maintaining the overall lightweight nature of the model.
The reason why the adapted PAN++ can enhance feature representation without introducing excessive redundant computation lies in its lightweight and task-oriented redesign. The original PAN++ framework contains several text-detection-specific components, such as mask prediction, text kernel segmentation, and boundary regression branches. These components are useful for arbitrary-shaped text detection but are not necessary for bounding-box-based PCB defect detection. Therefore, this paper does not directly adopt the complete PAN++ framework. Instead, only its core bidirectional cross-scale aggregation and feature pyramid enhancement paths are retained, while the task-irrelevant prediction branches are removed.
In terms of feature enhancement, the retained PAN++ path strengthens the bottom-up information flow by propagating shallow high-resolution spatial details to deeper semantic layers. This complements the top-down semantic transmission of FPN and enables the detection head to receive features that contain both fine-grained localization information and high-level semantic discrimination. As a result, tiny defect boundaries, local textures, and global semantic cues can be more effectively integrated across different scales.
In terms of computational efficiency, the proposed FPN-PAN++ does not perform dense all-to-all feature fusion among all pyramid levels. Instead, it mainly conducts adjacent-level feature aggregation through lightweight upsampling, downsampling, concatenation, and convolution operations. This design reuses the existing multi-scale feature maps generated by the YOLOv8 backbone and avoids adding heavy attention modules or extra prediction branches. Therefore, the feature capability is improved by optimizing the information flow rather than simply increasing network depth or width. Thus, the proposed structure effectively improves feature fusion performance with only a small amount of additional computational overhead, while maintaining high inference efficiency.
The enhanced bidirectional feature fusion of FPN-PAN++ is achieved through two complementary information propagation paths. In the top-down FPN pathway, high-level semantic features are progressively upsampled and fused with low-level high-resolution features, so that shallow feature maps obtain stronger semantic guidance for distinguishing true defects from complex circuit textures. In the bottom-up PAN++ pathway, low-level spatial details are further aggregated and propagated back to deeper layers through downsampling and path enhancement, allowing high-level features to recover fine-grained localization cues that may be weakened during repeated downsampling. Therefore, the feature flow is no longer a single semantic transmission from deep to shallow layers, but a bidirectional circulation between semantic-rich deep features and detail-rich shallow features.
Compared with conventional FPN-PAN fusion, the adopted lightweight PAN++ further strengthens cross-scale interaction by enhancing the aggregation paths between adjacent feature levels. This design reduces the semantic gap among multi-scale feature maps and enables the detection head to receive features that contain both global semantic discrimination and local spatial details. For PCB defect detection, this is particularly important because tiny defects usually occupy only a small number of pixels and are easily confused with dense background textures. Through the FPN-PAN++ structure, shallow defect boundaries and textures can be preserved, while deep semantic information can suppress background interference, resulting in more robust multi-scale defect representation.
Previous YOLO-Mamba-based detection methods mainly focus on introducing Mamba modules into the backbone or neck to improve long-range dependency modeling, but they retain the original FPN-PAN structure without improving cross-scale fusion. This paper further redesigns the feature fusion stage by combining C2f-Mamba with a lightweight FPN-PAN++ structure. In other words, previous YOLO-Mamba methods primarily enhance feature extraction, whereas the proposed method simultaneously enhances feature extraction and cross-scale feature interaction. The C2f-Mamba module provides local–global feature representation, while FPN-PAN++ constructs a bidirectional multi-scale information flow. The two modules are complementary: Mamba improves global contextual modeling, and FPN-PAN++ ensures that global context and local defect details are effectively propagated across different scales. To the best of our knowledge, such a combination has not been specifically adopted in previous YOLO-Mamba methods for PCB defect detection.
3.5. Detection Head and Loss Function
This paper continues to use the decoupled detection head structure of YOLOv8, separating the classification task from the regression task in order to improve detection accuracy and training stability.
The detection head mainly consists of two branches: the first is the classification branch, which is used to predict the probability of object categories; the second is the regression branch, which is used to predict the positions of the object bounding boxes. Compared to traditional coupled structures, the decoupled design can reduce interference between tasks, allowing the model to achieve better convergence performance in complex scenarios. Meanwhile, YOLOv8 adopts an anchor-free mechanism, directly predicting the center points of objects, avoiding the hyperparameter dependency issues brought by anchor design, thereby enhancing the model’s generalization capability.
During training, this paper follows the original YOLOv8 design to ensure experimental fairness and stability. The overall loss function consists of classification loss, bounding box regression loss, and distribution focal loss (DFL):
where
represents the classification loss, which measures the error in category prediction;
represents the regression loss, which constrains the position deviation between the predicted boxes and the ground-truth boxes;
refines the bounding box prediction through a distribution-based focal mechanism, replacing the traditional discrete objectness loss.
In bounding box regression, IoU-based loss functions (such as CIoU or SIoU) are commonly used:
Through joint optimization of multiple tasks, the model can simultaneously maintain classification accuracy and localization precision.
3.6. Model Dynamic Training and Reasoning Process
During the model training phase, this study adopts a unified data augmentation and optimization strategy to improve the model’s generalization ability and training stability. The training process mainly includes steps such as data preprocessing, forward propagation, loss calculation, and backpropagation. The detailed architecture is illustrated in
Figure 5.
First, the input images are normalized in size and augmented (such as random flipping, cropping, etc.), and then fed into the improved network model. The model extracts multi-scale features through the Backbone, enhances global modeling capabilities via the C2f-Mamba module, and then performs multi-scale feature fusion through the FPN-PAN++ structure, with the detection head finally outputting the prediction results.
During training, the loss function is used to compute the error between the prediction results and the ground truth labels, and the model parameters are updated using the backpropagation algorithm. At the same time, automatic mixed precision (AMP) and learning rate scheduling strategies are introduced to improve training efficiency and accelerate convergence.
During reasoning, the input image only needs to go through a single forward pass to obtain detection results. Subsequently, redundant detection boxes are removed through Non-Maximum Suppression (NMS) to obtain the final detection results.
Thanks to the efficient design of the C2f-Mamba module and FPN-PAN++ structure, the model can maintain high operational speed during inference while ensuring detection accuracy, meeting the real-time requirements of industrial scenarios.
The efficient inference performance is achieved by controlling the computational overhead of both improved modules. In C2f-Mamba, the Mamba branch is introduced through channel splitting, so only part of the features are processed by the global state–space modeling path, while the other branch retains lightweight convolutional extraction. Moreover, Mamba models long-range dependencies with linear complexity, avoiding the quadratic computational cost of Transformer self-attention. In FPN-PAN++, only the core bidirectional feature aggregation paths are retained, and task-irrelevant branches from the original PAN++ framework are removed. The fusion process mainly uses lightweight adjacent-scale operations such as upsampling, downsampling, concatenation, and convolution. Therefore, the proposed model enhances local–global feature representation and multi-scale fusion without introducing excessive computation. Accordingly, the designed structure promotes stronger feature fusion with modest additional computation and preserves efficient inference performance.
4. Experimental Results and Analysis
4.1. Experimental Environment
The experimental platform uses an NVIDIA RTX 4090, Intel Xeon Platinum 8470Q, Ubuntu 22.04. The deep learning framework is PyTorch 2.1.0, Python 3.10, CUDA 12.1. The parameters of the experimental environment are shown in
Table 1. All baseline methods are retrained from scratch with the same 100 epochs, input size, optimizer, and data augmentation for full fair comparison.
4.2. Dataset
This study conducts experimental verification using the PCB defect dataset publicly released by Peking University and Kaggle. This dataset contains a total of 1386 PCB defect images, with an average resolution of 2777 × 2138. The dataset originates from industrial visual inspection scenarios and is widely applied in research on printed circuit board defect detection, effectively reflecting the detection requirements in actual production environments. As shown in
Figure 6, the dataset includes six typical defect types, such as short, mouse bite, spur, and missing hole, characterized by a high proportion of small targets, uneven class distribution, and complex backgrounds. Among them, small-sized defects occupy only a small pixel area in the images, posing higher demands on the model’s feature extraction capability, while complex circuit texture backgrounds can easily interfere with defect recognition.
In terms of data preprocessing, the original images are first uniformly resized and normalized to ensure consistency of model input; subsequently, data augmentation strategies such as random flipping, random cropping, and color perturbation are introduced to enhance the model’s adaptability to different scenarios; finally, the data are divided into training, validation, and test sets according to a certain ratio to ensure the reliability and fairness of the experimental results.
In terms of data set division, this paper uses the general division strategy of industrial defect detection and randomly divides the PCB defect dataset into a training set, verification set and test set according to the ratio of 8:1:1, and the three sets maintain the same category distribution to avoid the impact of data skew on the experimental results. All experiments were evaluated on the same test set to ensure the fairness of the comparison.
4.3. Evaluation Metric
In order to comprehensively evaluate the performance of the model in the PCB defect detection task, this paper selects commonly used evaluation metrics in the field of object detection, including mean average precision (mAP), precision, and recall.
The mean average precision is used to measure the overall detection capability of the model across all categories and is defined as the mean of the average precision for each category:
Here,
represents the number of categories, and
represents the average precision of the i-th category. This metric can comprehensively reflect the model’s detection performance across different categories. Precision is used to measure the proportion of true positives among the samples predicted as positive by the model, and its calculation formula is:
Here,
represents true positives, and
represents false positives. The higher the precision, the fewer incorrect detections the model produces.
Recall is used to measure the proportion of actual targets that are correctly detected, and it is defined as:
where FN represents false negatives. A higher recall indicates fewer missed detections by the model.
In addition, this paper also uses mAP@0.5 and mAP@0.5:0.95 as supplementary metrics. Specifically, mAP@0.5 is calculated at an IoU threshold of 0.5, while mAP@0.5:0.95 is averaged over multiple IoU thresholds, allowing for a more rigorous evaluation of the model’s overall detection capability.
4.4. Experimental Results
The YOLOv8n+C2f-Mamba+FPN+PAN++ model proposed in this paper outperforms the baseline model and other comparative methods on all evaluation metrics. Compared with the original YOLOv8n model (mAP@0.5 of 97.2% and mAP@0.5:0.95 of 57.5%), the improved model increased to 98.8% and 62.5%, respectively, with the mAP@0.5:0.95 improving by 5 percentage points, indicating that the model has significant advantages under more stringent evaluation conditions.
In order to verify the reliability and effectiveness of the experimental results, the optimal proposed model and all baseline models were tested under the same experimental settings and evaluation criteria. The performance was comprehensively evaluated using multiple common metrics in object detection, including mAP@0.5, mAP@0.5:0.95, precision, and recall. The experimental results demonstrate that the proposed model achieves more competitive and superior performance than other comparison methods across all evaluation indicators. This indicates that the designed model has good stability, strong robustness, and high detection reliability, which further verifies the effectiveness and practicability of the improved strategy in PCB defect detection tasks.
From the perspective of comparative methods, as shown in
Table 2, YOLOv6 [
42] and YOLOv7 [
43] primarily improve detection performance through structural optimization and training strategy enhancements, but they still exhibit certain limitations in detecting small targets and in complex scenarios. YOLOv9 and YOLOv11n [
44] further enhance performance, achieving higher Box(P), recall, and mAP50 values, with YOLOv9 reaching 98.2% Box(P) and recall, and YOLOv11n achieving 98.0% Box(P) and 98.4% recall. Moreover, their improvements are particularly notable in the mAP50-95 metric, indicating better performance in challenging detection scenarios. The combined model YOLOv8n+C2f-Mamba+FPN+PAN++ further leverages these advantages, attaining the highest overall detection metrics among the tested methods.
Meanwhile, as shown in
Table 3, attention mechanism-based improvements (such as CBAM and CA) enhance detection performance to some extent. Among them, the YOLOv8n+CA model achieves a mAP@0.5 of 98.3%, outperforming the baseline model, although the overall improvement is limited. Transformer-based approaches (such as ViT, DETR, and Swin Transformer), despite possessing some global modeling capability, show inconsistent performance in this task. For example, the YOLOv8n+Swin Transformer model achieves only a mAP@0.5 of 96.9%, even lower than the baseline model, indicating an insufficient adaptability to small, densely distributed targets in complex background scenarios.
In contrast, as shown in
Table 4, the method proposed in this paper achieves comprehensive detection performance improvements after introducing the C2f-Mamba module and the FPN-PAN++ structure. Specifically, with only FPN+PAN++ introduced, mAP@0.5 increases to 98.0% and mAP@0.5:0.95 reaches 56.1%; with only the C2f-Mamba module, mAP@0.5 further rises to 98.4% and mAP@0.5:0.95 is 55.7%. When both are combined, the model performance reaches its optimum, indicating a significant synergistic enhancement effect between global modeling and multi-scale feature fusion.
Compared with the YOLO-Mamba series and the generally improved YOLO methods, the proposed model shows obvious advantages. First, previous YOLO-Mamba focuses on natural scenes or infrared images, lacking optimization for PCB small defects and complex textures. Second, general YOLO improvements only use attention or simple feature fusion, lacking global modeling with linear complexity. Third, this work combines C2f-Mamba and FPN-PAN++ for the first time, which not only enhances long-range dependency but also strengthens multi-scale information interaction, leading to higher mAP@0.5:0.95 and better robustness.
In view of the phenomenon that “the performance of a single module decreases slightly, and the performance of a double module combination achieves a significant improvement” in the ablation experiment, this paper carries out a special analysis, and the core causes can be summarized into three points. First, the structure of YOLOv8n is highly sensitive due to its extreme lightweight, with only 3.2 m parameters and extremely low model redundancy. Officials have completed in-depth optimization for the original C2f+FPN-PAN structure. The introduction of a single new module will break the balance of the original converged characteristic flow, causing a marginal negative effect of single module optimization. Second, the two modules have strong complementary closed-loop synergy. C2F-Mamba is responsible for extracting defect features with global context, and FPN-PAN++ is responsible for realizing the two-way cross-scale precise fusion of high-dimensional and low-dimensional features. Both of them jointly build a complete feature extraction fusion output closed-loop, which is indispensable. Only when they work together can they fully release their performance potential, so there is a nonlinear performance boom. The third is the adaptability difference of training hyperparameters. In order to ensure the fairness of comparison, all models adopt unified hyperparameter training. The original hyperparameters only adapt to the original structure of YOLOv8n. After the introduction of a single module, it is easy to converge to the local optimum, while the two modules synchronously reconstruct the characteristic flow, which can adapt the current hyperparameters and converge to the global optimum.
Figure 7 reflects the model’s convergence stability and accuracy performance over 50 training epochs. The six subplots on the left show that the localization, classification, and distribution losses (Loss) on both the training and validation sets exhibit a rapid and steady decline; the four subplots on the right illustrate a consistent increase in precision, recall, and mAP50/mAP50-95 metrics, demonstrating that the model maintains high robustness and detection consistency even under stringent evaluation criteria.
In addition, in terms of the precision and recall metrics, the method proposed in this paper achieved 98.5% and 98.4%, respectively, showing a significant improvement compared with the YOLOv8n model (96.4% and 94.6%), further indicating that this model has a stronger capability in reducing false positives and missed detections. Under complex background conditions, traditional models are easily disturbed by circuit textures, resulting in incomplete detection results or inaccurate localization. However, the method proposed in this paper, by integrating global contextual information with multi-scale features, significantly enhances feature representation, making the detection results more complete and the localization more precise, demonstrating stronger robustness.
Figure 8 presents the ground-truth annotations of representative samples from the PCB defect dataset, where the defect regions are marked with bounding boxes and corresponding class labels. These annotations provide the reference labels for evaluating the detection performance of the proposed model. Based on these ground-truth labels,
Figure 9 shows the visual detection results of the improved algorithm on the PCB defect dataset. As depicted, the model can accurately locate various PCB defects, including missing holes, mouse bites, and spurs, with complete detection boxes and consistent class predictions. Even in scenarios with dense circuit textures, tiny defect sizes, or complex backgrounds, the model is still able to accurately identify and enclose all target defects, with high boundary conformity, intuitively demonstrating the excellent defect detection capability and robustness of the proposed method in complex industrial settings.
Meanwhile,
Table 5 shows the model complexity and inference speed comparison. The proposed model only increases parameters from 3.20 M to 3.86 M, GFLOPs from 8.7 to 10.2, and maintains 159 FPS, which still meets real-time industrial requirements. All FPS values are tested under the same hardware and software environment: NVIDIA RTX 4090 GPU, Ubuntu 22.04, PyTorch 2.1.0, CUDA 12.1, fixed input resolution 640 × 640, batch size = 16, single-image inference mode without pre-processing and post-processing (NMS) time. The real-time performance is evaluated under standard GPU deployment conditions, which is consistent with common industrial visual inspection testing standards.
4.5. Experimental Conclusion
Through experimental validation on the PCB defect dataset, it can be observed that the C2f-FPN-PAN++-Mamba Improved YOLOv8n model proposed in this paper demonstrates superior performance in PCB defect detection tasks. The experimental results indicate that introducing the C2f-Mamba module can effectively enhance the model’s global feature modeling capability, while the FPN-PAN++ structure significantly improves multi-scale feature fusion. The combination of the two further enhances the model’s detection ability for small objects and complex background scenarios.
Without significantly increasing computational complexity, the proposed method achieves noticeable improvements in detection accuracy, recall, and overall stability, verifying the feasibility and practical value of this model in industrial visual inspection.
In practical factory environments, PCB defect detection is more challenging than standard object detection because defects are usually very small, low-contrast, and easily confused with dense circuit textures. In addition, industrial images may suffer from uneven illumination, slight motion blur, imaging noise, and incomplete defect boundaries caused by high-speed production lines or camera acquisition conditions. These factors make tiny defects such as mouse bites, spurs, open circuits, and missing holes difficult to distinguish from normal PCB patterns, which may lead to missed detections or false alarms.
The proposed C2f-FPN-PAN++-Mamba framework is designed to address these difficult industrial conditions from both feature extraction and feature fusion perspectives. First, the C2f-Mamba module adopts a dual-branch structure, where the convolution branch preserves local edge, texture, and fine-grained spatial details, while the Mamba branch captures long-range contextual dependencies with linear computational complexity. This local–global collaborative representation helps the model distinguish true tiny defects from similar background textures under low-contrast or blurry conditions. Second, the FPN-PAN++ structure strengthens bidirectional multi-scale feature fusion. Low-level spatial details are effectively transmitted to deeper layers, while high-level semantic information is fed back to shallow layers, improving the representation of weak and small defect regions. Therefore, the proposed model can better maintain defect details, suppress background interference, and improve the accuracy of small-defect detection and classification while preserving real-time inference capability for industrial deployment.
The proposed C2f-FPN-PAN++-Mamba framework achieves state-of-the-art performance on the PCB defect dataset with 98.5% precision, 98.4% recall, 98.8% mAP@0.5, and 62.5% mAP@0.5:0.95, surpassing YOLOv8n, attention-based models, and Transformer-based models. The novelty lies in: (1) C2f-Mamba dual-branch design for efficient local–global modeling; (2) lightweight FPN-PAN++ for strengthened bidirectional fusion; (3) the integration that maintains lightweight (3.86M params, 159 FPS). All objectives are reached: small-defect detection improved, false/missed detection reduced, and real-time industrial deployment supported.
Although the proposed model achieves satisfactory overall performance, several failure cases still exist in practical detection. First, extremely tiny and low-contrast defects may be missed due to insufficient local feature saliency. Second, strong background noise may lead to false positives in highly cluttered regions. Third, similar textures between different defect categories may cause occasional misclassification. These limitations are mainly caused by insufficient feature discrimination in challenging regions, which will be addressed in future work by enhancing fine-grained feature representation and noise robustness.
5. Conclusions
This study proposed an improved PCB defect detection model based on C2f-Mamba and FPN-PAN++ to address the challenges of small-defect localization and classification in complex circuit-board backgrounds. By introducing the C2f-Mamba module, the model enhances the joint representation of local texture details and long-range contextual information, which is important for distinguishing tiny defects from dense circuit patterns. Meanwhile, the FPN-PAN++ structure strengthens multi-scale feature fusion and improves the transmission of shallow spatial details and deep semantic information, thereby improving the detection stability of small and weak defect regions.
Experimental results on the PCB defect dataset show that the proposed method achieves better detection accuracy than the baseline model while maintaining a relatively lightweight computational cost. The ablation study further confirms that both C2f-Mamba and FPN-PAN++ contribute positively to the final detection performance. Visual comparisons between ground-truth annotations and predicted results also indicate that the improved model can accurately locate typical PCB defects, such as missing holes, mouse bites, and spurs, under dense textures and complex backgrounds. These results suggest that the proposed method has practical potential for automated PCB quality inspection in industrial production environments.
Nevertheless, this study still has several limitations. The current experiments are mainly conducted on a public PCB defect dataset, and further validation on real factory images with stronger illumination variations, motion blur, and device-dependent imaging noise is still needed. In addition, although the proposed model maintains good detection performance, real-time deployment on edge inspection devices may require further compression. Therefore, future work will focus on model pruning, quantization, and deployment optimization based on the current model structure. Few-shot or domain-adaptive learning will also be explored to improve the model’s adaptability to rare defect types and new production scenarios where labeled samples are limited.