An Improved YOLOv8n Framework for PCB Defect Detection via C2f-Mamba Feature Extraction and FPN-PAN++ Multi-Scale Fusion

Hua, Xuan; Jiang, Haolin; Wang, Hao; Shan, Yahui

doi:10.3390/sym18060969

Open AccessArticle

An Improved YOLOv8n Framework for PCB Defect Detection via C2f-Mamba Feature Extraction and FPN-PAN++ Multi-Scale Fusion

¹

International School of Information Science and Engineering, Dalian University of Technology, Dalian 116620, China

²

Wuhan Second Ship Design and Research Institute, Wuhan 430064, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Symmetry 2026, 18(6), 969; https://doi.org/10.3390/sym18060969

Submission received: 28 April 2026 / Revised: 27 May 2026 / Accepted: 31 May 2026 / Published: 3 June 2026

Download

Browse Figures

Versions Notes

Abstract

To address the issues in existing PCB defect detection models, including insufficient capability for capturing small defects, weaker global feature modeling, and inadequate multi-scale feature fusion, this paper proposes a C2f-FPN-PAN++-Mamba model based on an improved YOLOv8n. The Mamba state–space model is embedded into the C2f module to construct a C2f-Mamba feature extraction unit, which, while retaining the local perception capability of convolution, enhances long-range dependency modeling, accurately capturing global semantic information of subtle defects in complex backgrounds and significantly improving the model’s feature representation ability for small defects. Meanwhile, an FPN-PAN++ enhanced feature fusion structure is introduced, achieving efficient complementary interaction between high and low-level features through bidirectional cross-scale feature aggregation and path augmentation, thereby strengthening the model’s robustness in identifying multi-scale and multi-form defects. Finally, the C2f-Mamba and FPN-PAN++ are organically integrated, improving global modeling and multi-scale fusion capabilities while maintaining lightweight computational efficiency, effectively reducing the miss and false detection rates of small defects. Experimental results indicate that, compared with the original YOLOv8n model, the proposed method achieves significant performance improvements in PCB defect detection tasks. On the PCB defect dataset, the model’s precision increased from 96.4% to 98.5%, recall from 94.6% to 98.4%, and mAP@0.5 from 97.2% to 98.8%, with the mAP@0.5:0.95 metric, reflecting multi-scale detection performance, rising dramatically from 57.5% to 62.5%. Experiments demonstrate that this method effectively enhances detection capability for small and complex defects while preserving the advantages of a lightweight model and high inference speed, providing a reliable technical solution for high-precision, real-time PCB defect detection in industrial scenarios.

Keywords:

PCB defect detection; YOLOv8n; mamba; state–space model; small object detection; deep learning

1. Introduction

Electronic products have become indispensable in modern society and are widely used in homes, factories, communication systems, transportation, and medical care. Printed Circuit Boards (PCBs) are the core carriers of electronic components and electrical signal transmission in these products. Their manufacturing quality directly affects the safety, stability, and reliability of electronic devices. In practical production, defects such as short circuits, open circuits, missing holes, mouse bites, and spurs may cause unstable electrical connections, signal transmission failures, or even complete device malfunction. Such failures can lead to economic losses in consumer electronics and may cause more serious risks in industrial automation and medical equipment. Therefore, accurate and efficient PCB defect detection is of great importance for improving product quality, reducing manufacturing costs, and ensuring the safe and reliable operation of electronic products. Traditional PCB defect detection methods mainly rely on manual visual inspection or image processing-based algorithms, such as template matching, edge detection, and threshold segmentation. These methods typically depend on human experience or handcrafted features, which results in low detection efficiency, poor robustness, and insufficient adaptability to complex backgrounds [1]. With the development of machine learning methods, detection approaches based on algorithms like Support Vector Machines (SVM) and Random Forests have improved detection performance to some extent, but they are still limited by feature representation capabilities and struggle to meet the demands of complex industrial scenarios [2]. PCBs exhibit strong structural symmetry and regular texture distribution, which can both assist defect localization and bring interference when defects break the symmetric pattern, making accurate detection more challenging.

In recent years, with the development of deep learning technology, object detection methods based on convolutional neural networks (CNNs) have made significant progress in the field of industrial vision. In particular, the YOLO series of algorithms has been widely used in PCB defect detection due to its end-to-end structure and superior real-time performance [3]. As the latest representative of this series, YOLOv8 achieves a good balance between detection accuracy and speed by introducing an anchor-free mechanism and a decoupled detection head. However, in practical PCB defect detection tasks, there are still challenges such as small object detection difficulties, severe interference from complex backgrounds, and insufficient feature representation capability [4].

Addressing the aforementioned issues, existing studies have primarily focused on improving YOLO models in two directions.

One approach enhances model performance by optimizing convolutional structures and feature fusion strategies. For example, Sun et al. introduced the CBAM attention mechanism into YOLOv8, combining channel and spatial attention to significantly enhance the model’s feature representation capability for small-scale defects, achieving notable performance improvements on the PCB dataset [5]. Wei et al. proposed the PCB-YOLO model, which effectively improved detection accuracy in complex backgrounds by incorporating multi-scale feature fusion and attention mechanisms [6]. Moreover, Li et al. proposed the SCF-YOLO model, which, by designing a lightweight structure and feature fusion module, increased detection speed and accuracy while reducing model complexity [7]. Similarly, SEPDNet achieved performance improvements with fewer parameters by introducing an FPN structure to optimize the feature fusion path [8]. Additionally, in other visual tasks, research has also shown that optimizing feature enhancement and fusion structures can improve model performance. For instance, Wang et al. proposed SCP-DETR, which introduces a kernel-based representation learning strategy and a stacked feature pyramid enhancement module to achieve efficient multi-scale feature fusion and representation, significantly improving the detection accuracy of tiny PCB defects [9]. Although the aforementioned methods improve model performance to some extent, their core still relies on convolution operations, with feature modeling primarily confined to local receptive fields, which remains insufficient for capturing long-distance dependencies in complex backgrounds.

Another category of methods introduces the Transformer architecture (such as ViT, Swin Transformer, and DETR), utilizing the self-attention mechanism to enhance the model’s global modeling capabilities. Unlike traditional convolutional neural networks, which are limited by local receptive fields, Transformers can capture long-range dependencies through multi-head self-attention mechanisms, thereby improving feature representation capabilities in complex scenarios. For example, ViT-YOLO introduces a multi-head self-attention module into the YOLO backbone network, enhancing global context modeling capabilities by constructing an MHSA-Darknet structure and combining it with BiFPN to achieve multi-scale feature fusion, significantly improving detection performance [10]. In addition, Swin Transformer-based improvements are also widely applied in object detection tasks. For instance, ST-YOLOA embeds the Swin Transformer into the YOLO backbone network, achieving collaborative modeling of local and global information through the hierarchical window attention mechanism, and effectively improving detection accuracy in complex backgrounds by integrating attention mechanisms with the feature pyramid structure [11]. The above methods introduce Transformer modules into the YOLO framework, enhancing the model’s global modeling capabilities to a certain extent. Meanwhile, another line of research focuses on improving Transformers at the detection framework level. For example, An and Zhang proposed LPViT, a Transformer-based model for PCB image classification and defect detection, which integrates mask patch prediction and label smoothing to enhance model robustness and feature representation [12]. These methods are more inclined towards a Transformer-dominated detection paradigm in terms of structural design, further expanding the design space for object detection models. However, Transformer architectures typically rely on self-attention mechanisms, whose computational complexity grows quadratically with feature scale, leading to large model parameters and high computational costs, which still pose certain limitations in industrial scenarios where resources are limited or real-time performance is required.

At the same time, there are also other improvement methods. Recent studies, such as MAS-YOLO, have shown that introducing Adaptive Hierarchical Feature Integration (AHFIN) and Median Enhancement Attention Mechanism (MECS) into the YOLO architecture can further enhance the model’s ability to perceive tiny defects in PCBs. This aligns with the approach in this paper, which employs the Mamba architecture and PAN++ structure to strengthen global context and detail feedback, fully demonstrating the necessity of optimizing feature flow and attention distribution for industrial defect detection tasks [13].

In response to the characteristics of PCB defects, namely “numerous small targets and high real-time requirements,” some studies have begun to explore detection methods that balance lightweight design and high accuracy. For example, Hou et al. proposed the EffNet-PCB model, which improves detection accuracy while reducing the number of parameters by optimizing the feature fusion module and the detection head structure [14]. Li et al. achieved multi-scale PCB defect detection through pruning and lightweight network design, significantly enhancing detection performance while ensuring real-time capability [15]. Furthermore, Cao et al. proposed a PCB defect detection method based on multi-scale feature enhancement and a background suppression mechanism. By constructing a multi-scale contextual feature enhancement module combined with a background decoupling attention mechanism, this approach effectively suppresses background interference and enhances the feature representation of micro-defects in complex industrial environments, while significantly improving detection accuracy without compromising real-time performance [16]. Despite the improvements in detection performance brought by these methods, there remain the following issues: on the one hand, most methods lack effective global modeling capability; on the other hand, the introduction of complex structures often increases computational costs, affecting the real-time performance of the models.

In recent years, state–space models (SSMs) have achieved groundbreaking progress in visual tasks. Among them, the Mamba model has attracted widespread attention due to its linear-time complexity and excellent long-sequence modeling capabilities. Compared to the Transformer architecture, Mamba can significantly reduce computational complexity while maintaining global modeling capabilities, making it more suitable for real-time detection tasks. Related studies have attempted to introduce Mamba into object detection frameworks, such as the YOLO-Mamba model, which balances long-range dependency modeling and efficient computation by integrating SSM [17]. However, research on applying Mamba to PCB defect detection tasks remains relatively limited, especially in terms of small target detection and multi-scale feature fusion, where there is still considerable room for improvement.

Based on the above analysis, this paper proposes a PCB defect detection method called C2f-FPN-PAN++-Mamba Improved YOLOv8. This method optimizes YOLOv8 on two levels: feature extraction and feature fusion. On one hand, the C2f-Mamba module is introduced into the Backbone and Neck, embedding the state–space model (Mamba) into the C2f structure to enhance the model’s global modeling capability and long-range dependency representation. On the other hand, the original feature fusion structure is improved by introducing a multi-scale feature fusion strategy that combines FPN and PAN++, strengthening information interaction between features at different scales and significantly improving the model’s ability to detect small target defects.

The main contributions of this paper are summarized as follows:

A C2f-Mamba feature extraction module is proposed for PCB defect detection.

Unlike existing YOLO-Mamba methods that simply insert Mamba blocks into the backbone or neck, the proposed module embeds Mamba into the C2f structure and adopts a channel-split dual-branch design. This design integrates convolution-based local perception with Mamba-based global modeling, thereby enhancing the representation of tiny PCB defects while preserving the lightweight characteristics of YOLOv8n.

A lightweight FPN-PAN++ multi-scale feature fusion structure is introduced.

Different from conventional FPN-PAN structures or PAN++ designs originally used for text detection, the proposed FPN-PAN++ is adapted to PCB defect detection by retaining bidirectional cross-scale aggregation and path enhancement. It strengthens the complementary fusion of high-level semantic information and low-level spatial details, improving the detection robustness for small and multi-scale defects.

A unified C2f-Mamba and FPN-PAN++ improved YOLOv8n framework is constructed.

The proposed framework jointly enhances local–global feature representation and bidirectional multi-scale feature fusion, addressing the limitation that existing methods usually improve either global modeling or feature fusion alone. This collaborative design is especially suitable for PCB defects with small size, dense textures, and complex backgrounds.

The proposed model achieves improved detection accuracy while maintaining lightweight and real-time performance.

Experimental results show that the model achieves 98.5% precision, 98.4% recall, 98.8% mAP@0.5, and 62.5% mAP@0.5:0.95 with only 3.86 M parameters, demonstrating a favorable balance between detection performance and computational efficiency for industrial PCB defect inspection.

A systematic comparative and ablation study is conducted.

The proposed method is compared with baseline YOLO models, attention-based YOLO variants, Transformer-based YOLO variants, and single-module improved models, verifying the effectiveness and complementarity of C2f-Mamba and FPN-PAN++ for high-precision PCB defect detection.

2. Related Work

2.1. Feature Enhancement and Fusion Techniques in Industrial Defect Detection

In the task of industrial defect detection, because defects usually have the characteristics of large-scale change, complex texture and low contrast, how to improve the ability of feature expression has become a key problem. In recent years, research has focused on feature enhancement and feature fusion, and a comprehensive method combining the two has been gradually developed.

The feature enhancement method aims to improve the expression ability of the model for key defect areas by introducing an attention mechanism, context information or improving the convolution structure. For example, SDD-Net strengthens the semantic expression of small targets and suppresses irrelevant information by building a context enhancement module (CEM) and feature enhancement module (FEM), so as to improve the detection accuracy in complex scenes [18]. Similarly, the lightweight CNN model enhances the ability of feature expression while ensuring computational efficiency by introducing the coordinate attention mechanism (CA) and combining the multi-scale strategy [19]. In the case of few samples, feature enhancement is also very important. Aiming at the problem of insufficient PCB defect data, some researchers have highlighted the information of key areas by improving the CBAM attention module, so as to improve the feature discrimination ability under the condition of small samples [20]. In addition, FD-YOLO11 enhances the ability of multi-scale feature extraction by introducing self-calibrated convolution and optimizes the feature expression in combination with global and local information to achieve higher robustness in complex backgrounds [21].

The feature fusion method mainly improves the detection ability of the model for multi-scale defects by integrating the features of different levels or scales. In classical methods, the multi-scale feature pyramid (FPN) and its improved structure are widely used. For example, ABFPN achieves the balanced fusion of different scale features by introducing the pyramid pooling of void space (ASPP) and cross-layer connection, effectively improving the performance of small target detection [22]. In the light-weight direction, LFF-YOLO proposes a lightweight feature pyramid network (LFPN), which can achieve efficient multi-scale fusion while reducing the number of parameters, taking into account the detection accuracy and reasoning speed [23]. For small target defects, SO-YOLO enhances the detection ability of small defects by strengthening the fusion of shallow features and making full use of high-resolution details [24]. In addition, in the detection of steel surface defects, the multi-scale feature extraction and fusion module (such as msfe+eff) has been proven to be able to effectively integrate different levels of features and improve the overall detection performance [25].

In recent years, more and more researchers have combined feature enhancement with feature fusion to improve the ability of feature expression and multi-scale information utilization at the same time. For example, DSF-YOLO continuously enhances the feature representation during the fusion process through dynamic phased feature fusion (DSFFE) and dual multi-scale fusion module (DMFF), so as to realize the effective recognition of small targets and fuzzy boundaries [26]. At the system level, the digital twin defect detection method constructs a richer feature representation by integrating multimodal data (2D image and 3D depth information) to achieve higher precision defect recognition [27].

2.2. Research on Lightweight Object Detection in Industrial Scenarios

In the industrial scene, the target detection model not only needs to have high detection accuracy, but also must meet the requirements of real-time, low computational complexity and easy deployment. Especially on edge devices, embedded terminals and resource-constrained platforms, traditional high-complexity detection networks are often difficult to be directly applied. Therefore, lightweight object detection has gradually become an important direction of industrial vision research. The existing research can be roughly divided into four categories: lightweight backbone network design, lightweight feature fusion and detector head optimization, system-level optimization for edge deployment, and general framework exploration for lightweight and performance balance.

The research on lightweight target detection first focuses on the design of backbone network compression and an efficient feature extraction structure. The typical method is to introduce lightweight convolution, ghost module, deep separable convolution or improve the residual structure to reduce the amount of parameters and calculation while preserving the ability of feature expression as much as possible. For example, LiteYOLO-ID designed the lightweight convolution module EGC for insulator defect detection, and further built the EGC cspghostnet backbone network and lightweight EGC panet, which significantly reduced the number of parameters while maintaining or even improving the detection accuracy, reflecting the idea of collaborative design of lightweight backbone and lightweight neck [28]. Similarly, Mo et al. proposed SGT-YOLO, a lightweight PCB defect detection model improved from YOLOv5s, which adopts SE-ENv2 backbone, GC-Neck, TSCODE decoupled head and removes redundant detection heads to balance accuracy and model complexity [29]. In more general PCB detection tasks, the proposed lightweight deep convolution network maintains high detection speed and accuracy by optimizing model parameters and enhancing small-object feature extraction, demonstrating that the lightweight design is not only suitable for PCB component detection and character recognition but also applicable to real-time defect inspection in complex industrial production scenarios [30].

In addition to the backbone network, many studies have shown that the lightweight in industrial scenes can not only rely on “reducing the network”, but also must synchronously optimize the feature fusion path and the detector head structure to avoid the information loss caused by being lightweight.

Aiming at the problem that it is difficult to retain small defects in industrial surface detection, Edge-YOLO proposes an edge enhancement backbone, cross-stage edge enhancement module and a multi-scale hole convolution module with shared weight. Combined with a lightweight adaptive detection head, it significantly reduces the computational complexity and improves the multi-scale small defect detection ability [31]. In the gray-scale image detection scene, YOLO-MIF explored the efficient expression of a lightweight model in gray-scale industrial images through multi-information fusion, structural heavy parameterization and new decoupling detection head design, indicating that lightweight research has been expanded from simple “parameter reduction” to “input table backbone detection head” integrated optimization [32]. For the real-time significant target detection of steel strip surface defects, MINet has designed a multi-scale interaction module, which combines deep convolution and point-to-point convolution to realize multi-scale feature interaction with few parameters, representing the research idea of improving the expression ability of a lightweight network with module-level innovation [33].

The lightweight research in industrial scenes often takes edge deployment as the direct goal, so many works not only focus on the detection accuracy, but also emphasize the reasoning efficiency of the model on the specific hardware platform. For example, Damo-yolo proposed the idea of “large neck, small head” in the detector design, combined with the NAS backbone, RepGFPN and lightweight head design, taking into account the accuracy and delay in different scale models, reflecting the trend of “architecture search+deployment-oriented optimization” in industrial real-time detection [34]. In the detection of surface defects in electronic manufacturing, ATT-YOLO is designed around high-resolution input, real-time reasoning and deployment friendliness. It uses only one self-attention module combined with multi-scale feature extraction and an improved anchor box strategy, and achieves good detection performance while controlling the amount of calculation. It shows that industrial lightweight models often need to meet the requirements of high precision and high throughput at the same time [35]. In addition, the collaborative learning classification model for PCB defect detection targets image and label uncertainty in complex industrial environments and achieves robust and efficient defect recognition through symmetric residual filtering, auxiliary inference, and knowledge transfer mechanisms, which further demonstrates that the lightweight and robust design ideas have broad practicality in industrial quality inspection tasks with complex data conditions [36].

In addition to proposing improved models for specific industrial tasks, studies have also summarized the evolution trend of lightweight target detection from the perspective of the development of general detectors. Taking the relevant research of yolov9 as an example, its core is to improve the efficiency of feature extraction and the stability of gradient transfer through Gelan and PGI, while taking into account the accuracy and deployment performance of lightweight models, which shows that the current lightweight research has gradually moved from single module improvement to the collaborative optimization of the overall network information flow and training mechanism [37].

In general, the research on lightweight target detection in industrial scenes has evolved from simply relying on the reduction of parameters in the early days to a systematic design around lightweight backbone networks, efficient feature fusion, detector head optimization and edge deployment adaptation. The existing methods have achieved good results in different industrial tasks, but still face some common problems: first, the ability of fine-grained defect feature expression is easily weakened after using lightweight. Second, it is still difficult to accurately detect small targets and weak texture targets in a complex background. Third, the actual deployment efficiency and theoretical complexity of the model on different hardware platforms are not always consistent. Therefore, the development direction of industrial lightweight target detection in the future should pay more attention to the collaborative balance between accuracy, speed and deployment cost, and explore the data representation, structure reparameterization strategy and edge reasoning optimization mechanism suitable for specific industrial scenarios.

3. Materials and Methods

3.1. YOLOv8n Model

This study selected the advanced object detection model YOLOv8n as the baseline model for comparative experiments. As the latest version in the YOLO series, YOLOv8 adopts an end-to-end, single-stage object detection framework, mainly consisting of three components: Backbone, Neck, and Head. The Backbone is used to extract multi-level features from input images, the Neck achieves multi-scale feature fusion through a feature pyramid structure (such as FPN and PAN), and the Head is responsible for outputting the predicted class and location of the objects.

In the Backbone and Neck, YOLOv8 introduces the C2f module as the core feature extraction unit. The C2f module uses a branching structure and feature concatenation mechanism to enhance feature representation while maintaining computational efficiency. However, this module is still fundamentally based on convolution operations, primarily relying on local receptive fields for feature extraction, and has limited ability to model long-range dependencies.

As the lightweight version of the YOLOv8 series, YOLOv8n’s core advantage lies in extreme speed and minimal resource consumption while maintaining practical accuracy. It has only 3.2 million parameters, a small model size, low GPU and memory usage, and a frame rate far higher than larger models in the v8 series, such as v8s and v8m. Compared with earlier lightweight models like YOLOv5n [38], v8n adopts a more advanced C2f structure, decoupled detection head, and anchor-free design, achieving higher detection accuracy and more stable performance on small objects with similar speed and more efficient post-processing. Compared with other lightweight detection algorithms like SSD and EfficientDet-Lite, it features a simpler training process and a more complete deployment ecosystem, showing the best overall performance in deployment scenarios with limited computing power, low latency requirements, and high cost-effectiveness [39,40].

The original YOLOv8n has only 3.20 M parameters, 8.7 GFLOPs, and runs at 186 FPS on a single RTX 4090 GPU with 640 × 640 input, showing excellent efficiency for industrial deployment.

3.2. Improve the Overall Design of Our Model Architecture

In response to the characteristics of PCB defect detection, which include a large number of small targets, complex backgrounds, and high requirements for real-time performance, this paper improves YOLOv8 from three aspects: feature extraction capability, feature fusion efficiency, and model lightweighting, constructing the C2f-FPN-PAN++-Mamba Improved YOLOv8 model as shown in Figure 1.

First, in the feature extraction stage, we introduce the C2f-Mamba module to replace part of the original C2f structure, as shown in Figure 2. Different from the original C2f, our C2f-Mamba employs a channel-split dual-branch parallel design, which consists of a convolution branch for capturing local fine-grained spatial details and a Mamba branch for efficiently modeling long-range global dependencies. By integrating the state–space model, it achieves collaborative modeling of local features and global context, thereby enhancing the model’s feature representation ability in complex scenarios. Meanwhile, the design of the C2f-Mamba module is determined according to the feature distribution characteristics of PCB defects and the demand for lightweight real-time detection. The characteristic distribution of PCB defects is manifested as follows: a high proportion of small-target defects, minute defect sizes, dense textures, complex backgrounds with strong interference, and defects that exhibit multi-scale features, low contrast, and irregular shapes. Industrial online inspection requires models to possess characteristics of lightweight design, low computational overhead, and high inference speed to accommodate edge device deployment and real-time detection demands. Therefore, the model must enhance the global feature modeling capability for small defects while strictly controlling the number of parameters and computational costs. The module is deliberately deployed in the middle and deep stages of the backbone network instead of shallow layers, since shallow features mainly contain low-level edge and texture information with weak semantics, which provides limited gains for global modeling. In contrast, features in the middle and deep stages possess richer and more complete semantic information about defect structures and global layouts, making them more suitable for capturing long-range dependencies between defects and backgrounds. The dual-branch structure fuses local convolution features and global state–space features through channel concatenation, which retains fine-grained defect details while enhancing long-range dependency modeling without introducing complex weighting or attention operations. Such a configuration brings significant performance improvement with only a small increase in parameters and calculation, achieving a favorable trade-off between accuracy and efficiency that better meets the requirements of real-time industrial PCB defect detection compared with heavy Transformer-based structures.

Second, in the feature fusion stage, an improved FPN-PAN++ structure is adopted to strengthen information flow between multi-scale features, improving the utilization efficiency of features at different scales, and especially enhancing the detection capability for small target defects.

Finally, in the overall structural design, both detection performance and computational cost are considered. High-complexity Transformer structures are avoided, and the Mamba module is used to achieve near-global modeling capability, thereby maintaining high inference efficiency while ensuring accuracy improvement.

In industrial inspection scenarios, the model not only needs to have high detection accuracy but also must meet real-time and deployment efficiency requirements. Therefore, this paper focuses on the issue of lightweight design during the model development process. The statement is motivated by the online nature of industrial PCB inspection. In practical production lines, defect images are continuously captured and must be processed within a limited cycle time; otherwise, the detection system may become a bottleneck and reduce production efficiency. Meanwhile, industrial inspection systems are often deployed on edge or embedded platforms, where memory and computational resources are limited. Therefore, merely improving detection accuracy is insufficient. The model must also maintain low computational complexity and high inference speed.

On one hand, compared with Transformer structures, the Mamba model adopts a state–space modeling method with linear time complexity, which significantly reduces computational complexity and memory consumption while achieving global information modeling, making it more suitable for resource-constrained industrial environments.

On the other hand, the improvements made in this study do not significantly increase the network depth or width, but instead enhance feature utilization efficiency through structural optimizations, such as improvements to the C2f module and optimization of feature fusion paths, achieving performance gains with minimal parameter increases.

Furthermore, the improved FPN-PAN++ structure, through the design of efficient information transmission paths, reduces redundant computation, improves feature fusion efficiency, and helps enhance inference speed while maintaining detection accuracy.

Compared with existing YOLO-Mamba methods, the proposed model has three distinct differences: Existing methods only insert Mamba block into backbone, while this paper embeds Mamba into C2f to form C2f-Mamba for local–global fusion. This paper uses FPN-PAN++ to enhance bidirectional feature fusion, which is not adopted in previous YOLO-Mamba for defect detection. The whole model keeps lightweight and high speed for industrial deployment, while most Transformer or heavy Mamba models sacrifice real-time performance.

Overall, the proposed model achieves a good balance between accuracy and efficiency and possesses significant practical value for engineering applications.

3.3. C2f-Mamba Module

3.3.1. Principles and Limitations of the Native C2f Module

The C2f module is an important basic unit used for feature extraction in YOLOv8. Its design concept originates from the CSP (Cross Stage Partial) structure. Through the feature splitting and fusion mechanism, it enhances feature representation capability while ensuring computational efficiency. Specifically, the C2f module first splits the input features along the channel dimension. A portion of the features is directly connected across layers, while the other portion is processed through several convolutional layers for feature extraction. Finally, the features from all branches are concatenated and fused, achieving feature reuse and information enhancement.

This structure can effectively alleviate the gradient vanishing problem and, to some extent, reduce model parameters while improving computational efficiency. As a result, it is widely used in both the Backbone and Neck of YOLOv8. However, based on the practical requirements of PCB defect detection, the native C2f module still has certain limitations.

The C2f module fundamentally relies on convolution operations for feature extraction, with its receptive field mainly concentrated in local regions, limiting its ability to model long-range dependencies. In PCB images, defects often have strong correlations with their surrounding structures, and relying solely on local features may easily lead to misjudgments in complex backgrounds.

For small target defects (such as micro-cuts, pinholes, etc.), their feature information is relatively weak and can easily be gradually diluted or lost during multiple convolution layers. The C2f module lacks an explicit global information enhancement mechanism, making it difficult to effectively compensate for this problem.

Therefore, it is necessary to introduce mechanisms capable of modeling global information while maintaining the efficiency of the C2f module, in order to enhance the feature representation capability of the model in complex scenarios.

3.3.2. Basic Principles of the Mamba State–Space Model

In recent years, state–space models (SSMs) have made significant progress in the field of sequence modeling. Among them, the Mamba model, as a novel SSM structure, is able to model long sequences while maintaining computational efficiency. The basic form of a state–space model can be expressed as:

{\begin{matrix} h_{t} = A h_{t - 1} + B x_{t} \\ y_{t} = C h_{t} + D x_{t} \end{matrix}

(1)

Here,

x_{t}

represents the input sequence,

h_{t}

denotes the hidden state of the system,

y_{t}

is the output, and A, B, C, D are learnable parameter matrices. This model achieves dynamic modeling of sequence information through state recursion.

On this basis, the Mamba model introduces a Selective Scan mechanism to efficiently model the input sequence. The Mamba model introduces a Selective Scan mechanism to efficiently model the input sequence. The key idea of Selective Scan is to make the parameters of the state–space model input-dependent. In traditional linear time-invariant state–space models, the state transition and projection parameters are fixed for all input tokens, which limits the model’s ability to adaptively select useful information from different spatial positions. In contrast, Mamba dynamically generates several key parameters, such as the time-step parameter, the input projection parameter, and the output projection parameter, according to the current input token through learnable linear projections.

Therefore, the state update process in Mamba is no longer controlled by a fixed parameter set but is adaptively adjusted according to the input content at each position. This enables the model to selectively preserve useful information, update important contextual representations, and suppress irrelevant background interference. Such a selective mechanism is especially suitable for PCB defect detection, because tiny defects are often embedded in dense circuit textures and can easily be confused with background patterns.

In visual feature modeling, the two-dimensional feature map is first flattened into a one-dimensional spatial sequence. The Selective Scan operation is then performed along this sequence to capture long-range spatial dependencies with linear complexity. After global dependency modeling, the output sequence is reshaped back to the original two-dimensional feature map. In this way, Mamba can achieve efficient global context modeling while maintaining lower computational complexity than Transformer-based self-attention mechanisms.

Its core advantage lies in reducing the computational complexity of the traditional self-attention mechanism from

O (n^{2})

to linear complexity

O (n)

, Mamba can achieve linear complexity because it replaces the self-attention mechanism of the Transformer with recursive updates in the state space. Instead of computing pairwise similarities across all positions in the sequence, it performs serial updates of the hidden states through selective scanning, combined with hardware-friendly parallel scan algorithms. This ensures that the computational load scales strictly linearly with the sequence length, thereby avoiding quadratic complexity and achieving higher computational efficiency when processing long sequence data.

In visual tasks, the input 2D feature map

X \in R^{H \times W \times C}

usually needs to be reconstructed into a one-dimensional sequence form:

X \to {x_{1}, x_{2}, \dots, x_{H W}}

(2)

and fed into the Mamba module for global modeling. In this way, the model can establish long-range dependencies in the spatial dimension, thereby capturing richer contextual information.

Compared with Transformer models, Mamba not only enables global modeling but also significantly reduces computational complexity and memory consumption, making it more suitable for industrial inspection tasks with high real-time requirements.

Mamba maintains efficiency in long-sequence modeling through two core mechanisms. First, it uses a selective scan mechanism that implements state transition with input-dependent parameters in a single forward pass, avoiding the repeated dot-product operations of self-attention. Second, its computational complexity is reduced from O(n²) of Transformer self-attention to linear O(n), where n denotes sequence length. When applied to visual features, the 2D feature map is flattened into a 1D spatial sequence; Mamba models long-range dependencies along this sequence efficiently. This enables global context modeling at a cost comparable to convolutions, making it suitable for lightweight, real-time PCB defect detection.

Mamba can achieve efficient global context modeling while maintaining lower computational complexity for the following reasons:

(1) The computational complexity of Transformer self-attention is O(n²), whereas Mamba is linear O(n), resulting in a slower increase in computation with sequence length;

(2) Mamba does not require explicit construction of the attention weight matrix, allowing for more efficient memory access and better hardware execution compatibility;

(3) Mamba completes long-range dependency modeling through a selective state–space model, covering the global receptive field in a single forward pass, achieving global modeling capabilities comparable to Transformer, but with significantly lower computational and memory costs.

3.3.3. C2f-Mamba Module Construction

To address the shortcomings of the original C2f module in global information modeling, this paper proposes an improved structure integrating the Mamba mechanism—the C2f-Mamba module. Traditional convolutional neural networks excel at extracting local texture and edge features but are limited by their local receptive fields, making them unable to effectively model the long-range dependencies between defects and the global circuit layout in PCB images. Conversely, pure Mamba models possess powerful global modeling capabilities but tend to lose fine-grained spatial details, which are critical for detecting tiny PCB defects. Therefore, instead of simply inserting Mamba blocks at the end of C2f modules, we propose a local–global dual-branch parallel structure that simultaneously preserves the local perception advantages of convolution and the global modeling advantages of Mamba, enabling deep synergy between the two. The integration of the state–space model (Mamba) into the C2f module is motivated by the complementary strengths of convolution and SSM. Convolution excels at capturing local spatial details such as defect edges, textures, and fine-grained structures, which are critical for small PCB defect localization. Meanwhile, Mamba provides efficient long-range dependency modeling with linear complexity, enabling global context awareness to suppress background interference and capture structural relationships across the PCB. By fusing these two paths within a unified C2f-Mamba block, the model achieves collaborative representation: local features preserve precise defect details, while global features supply semantic consistency. This joint modeling significantly enhances feature representation in complex backgrounds with dense textures and weak defects, which directly addresses the insufficient global modeling of the original C2f module. This module achieves collaborative modeling of local and global features by introducing a state–space modeling path. The specific structural design is as follows:

First, the input features are represented as:

X \in R^{H \times W \times C}

(3)

Then divided along the channel dimension:

X = [X_{1}, X_{2}]

(4)

Among them,

X_{1}

and

X_{2}

represent different channel sub-features respectively.

For the local branch, the original C2f structure is retained for convolutional feature extraction:

F_{l o c a l} = C o n v (X_{1})

(5)

This branch is mainly responsible for capturing local texture and edge detail information. For the global branch, the feature map is first flattened into a 1D token sequence and then fed into the Mamba module for long-range dependency modeling. After sequence processing, the output sequence is reshaped back to the original spatial dimensions to preserve the spatial correspondence of features before feature fusion:

F_{g l o b a l} = R e s h a p e (M a m b a (F l a t t e n (X_{2})))

(6)

This process realizes the modeling of long-distance dependencies through a state–space modeling mechanism, thereby enhancing the ability to express global contextual information. Finally, local features are fused with global features:

F_{o u t} = C o n v (C o n c a t (F_{l o c a l}, F_{g l o b a l}))

(7)

Among them,

C o n c a t (\cdot)

denotes the channel concatenation operation, and the subsequent convolutions are used to further integrate feature information.

To address the challenges of strong background interference and difficulty in capturing tiny targets in PCB defect detection, this paper constructs the C2f-Mamba module as shown in Figure 3. Logically, this module achieves deep integration of local and global information by splitting the feature flow into dual-path processing via a Split operation: the ‘Direct access’ branch preserves the original local spatial features, while the other branch is input into a Mamba Layer Group composed of multiple SSM (Selective State–Space Model) units. The dual-branch structure is designed based on two critical feature types for PCB defect detection: 1. Local convolution features: retain high-resolution spatial details, edge information, texture patterns, and fine-grained defect morphology, which are essential for accurate bounding box regression. 2. Global state–space features: capture long-range spatial dependencies, global layout consistency, and inter-region contextual relationships, which help distinguish real defects from similar background textures. Channel concatenation is chosen for fusion because it maintains the full dimensionality of both feature sets without compression, allowing the subsequent convolution layer to adaptively learn complementary weights. This preserves both fine-grained localization cues and global semantic guidance, which is vital for detecting tiny, low-contrast defects in complex layouts. Using the selective scanning mechanism of the state–space model, it extracts long-range global dependencies, thereby significantly enhancing the model’s feature representation and recognition capability for tiny defects in complex backgrounds. Compared with traditional Transformer architectures, C2f-Mamba leverages the linear computational characteristics of SSM, obtaining a wide receptive field while keeping the complexity at the O(N) level, demonstrating very high inference efficiency. Experiments show that introducing this module can effectively improve the model’s sensitivity to tiny PCB damages and significantly reduce false positives and missed detections while ensuring real-time performance.

To clarify the collaborative modeling mechanism, the proposed C2f-Mamba module integrates local feature extraction and global context modeling through a parallel dual-branch structure. Specifically, the input feature map is first divided along the channel dimension into two complementary feature subsets. The local branch preserves the convolutional operation of the original C2f module to capture fine-grained spatial details, such as defect edges, textures, and local shape patterns. In parallel, the global branch reshapes the 2D feature map into a 1D token sequence and feeds it into the Mamba state–space module to model long-range spatial dependencies and global contextual relationships. After global sequence modeling, the feature sequence is reshaped back to the original spatial resolution. Finally, the local convolutional features and the global Mamba features are concatenated along the channel dimension and further fused by a convolution layer. In this way, local details and global semantic dependencies are not processed independently but are jointly integrated within the same C2f-Mamba block, enabling the network to retain tiny defect details while suppressing background interference through global contextual awareness.

3.4. FPN + PAN++ Feature Fusion Structure

In PCB defect detection, defects usually appear as small-scale, low-contrast, and irregular targets embedded in dense circuit textures. Therefore, relying only on single-level features is insufficient. Shallow features contain rich spatial details and edge information, which are important for locating tiny defects, but they lack strong semantic discrimination and are easily disturbed by background textures. Deep features contain stronger semantic information, but their spatial resolution is reduced after repeated downsampling, which may cause small defect details to be weakened or lost. Therefore, an effective feature fusion structure should not only transfer high-level semantic information to shallow layers, but also feed low-level detailed information back to deeper layers.

The Feature Pyramid Network (FPN) propagates high-level semantic information to lower layers through a top-down pathway, which is formally proposed and validated on the COCO detection benchmark [41]. COCO (Common Objects in Context) is a large-scale, universal, and widely accepted standard benchmark for object detection. It contains complex real-world scenes, large-scale variations, and a high proportion of small objects, and has become the de facto benchmark for validating multi-scale feature fusion methods such as FPN. Its core idea is to progressively upsample high-level feature maps and fuse them with corresponding low-level features via lateral connections, thereby enhancing the semantic representation capability of the lower-level features. However, this one-way information flow is insufficient for PCB defect detection because fine-grained defect details from shallow layers cannot be fully propagated to high-level semantic features. Although the conventional PAN structure introduces a bottom-up path, its feature interaction is still relatively limited when handling small defects with large-scale variation and complex backgrounds. To address this problem, this paper adopts a lightweight FPN-PAN++ structure to strengthen bidirectional cross-scale information flow. Specifically, the FPN path transfers global semantic information from deep layers to shallow layers, improving the semantic representation of small defects, while the PAN++ path further aggregates shallow spatial details and feeds them back to deeper layers, enhancing localization accuracy. Through this bidirectional feature circulation, high-level semantic information and low-level spatial details are repeatedly complemented and fused, thereby reducing the semantic gap between different feature levels and improving the detection robustness for multi-scale PCB defects.

Moreover, only the core bidirectional aggregation and feature pyramid enhancement components of PAN++ are retained, while task-specific text detection branches are removed. This lightweight adaptation avoids unnecessary computational overhead and makes the structure more suitable for real-time PCB defect detection. Therefore, the adoption of FPN-PAN++ is motivated by the need to achieve stronger multi-scale feature interaction, better small-defect localization, and improved robustness under complex PCB backgrounds.

3.4.1. FPN Feature Fusion Mechanism

In object detection tasks, there are significant differences in the feature distributions of objects at different scales. For PCB defect detection, small-scale defects (such as micro-circuit breaks or pinholes) are often more apparent in shallow features, while deep features contain richer semantic information. Therefore, how to effectively integrate multi-scale features becomes the key to improving detection performance.

The Feature Pyramid Network (FPN) propagates high-level semantic information to lower layers through a top-down pathway. Its core idea is to progressively upsample high-level feature maps and fuse them with corresponding low-level features, thereby enhancing the semantic representation capability of the lower-level features.

The basic process can be expressed as:

P_{i} = C o n v (C_{i} + U p (P_{i + 1}))

(8)

where

C_{i}

represents the i-th layer feature output from the Backbone,

P_{i}

is the fused feature map, and

U p (\cdot)

denotes the upsampling operation.

The top-down propagation works as follows: 1. The deepest feature map (e.g., C5) carries strong global semantics. 2. It is upsampled to match the spatial size of the shallower layer (C4). 3. The upsampled high-level features are fused with C4 via convolution or concatenation. 4. This process repeats from deeper to shallower layers (C5 → C4 → C3). In this way, low-level features obtain high-level semantic guidance, improving the discrimination of small defects while retaining high-resolution spatial details.

Through this architecture, FPN can introduce high-level semantic information while retaining high-resolution features, thereby improving the model’s ability to detect small objects. However, FPN only contains a top-down information flow and lacks a feature enhancement pathway from low to high layers, which still results in insufficient utilization of information in complex scenarios.

3.4.2. Advantages and Structure of PAN++

To further enhance feature fusion capability, this paper introduces the PAN++ structure based on FPN to achieve bidirectional enhancement of multi-scale features. The Path Aggregation Network (PAN) adds a bottom-up path to feedback low-level detailed information to high-level features, thereby compensating for the unidirectional information flow deficiency of FPN. The original PAN++ framework includes multiple text-specific components: 1. Mask prediction branch: outputs pixel-level text segmentation masks to locate arbitrary-shaped text regions. 2. Text kernel segmentation: predicts a compact text core to separate adjacent text instances and suppress background. 3. Boundary regression branch: predicts text contour offsets to refine irregular text boundaries. These branches are designed for end-to-end text detection but introduce unnecessary computation for PCB defect detection, which only requires bounding-box prediction. We therefore remove all text-specific heads and retain only the bidirectional cross-scale feature aggregation and pyramid enhancement modules. This paper does not directly employ the complete PAN++ framework, but carries out lightweight adaptation and transformation for PCB defect object detection task: Only retain the core bidirectional cross-scale feature aggregation and Feature Pyramid Enhancement Module (FPEM) to strengthen multi-scale feature fusion and information flow. Completely discard the Mask branch, text kernel segmentation and boundary regression modules that are dedicated to text detection, so as to avoid extra computation and task inconsistency. Connect the modified lightweight PAN++ with FPN to construct a bidirectional feature fusion architecture, which only outputs multi-scale feature maps for object detection and sends them to the decoupled detection head (classification + box regression) of YOLOv8.

Its bottom-up feature fusion process can be expressed as:

N_{i} = C o n v (P_{i} + D o w n (N_{i - 1}))

(9)

where

N_{i}

represents the fused feature, and

D o w n (\cdot)

denotes the downsampling operation.

By combining FPN and PAN++, this paper constructs a bidirectional feature fusion architecture as shown in Figure 4. Subfigure (a) shows the top-down feature pyramid (FPN) path, which focuses on enhancing the transmission of high-level semantic information; subfigure (b) illustrates the bottom-up path aggregation (PAN++) process, effectively feeding back detailed information through the introduction of additional paths.

This structure demonstrates significant advantages in PCB defect detection tasks: It strengthens the feature representation of small targets, with high-resolution low-level details (such as the edges of fine wires) fully retained and fed back to higher levels via the enhancement path in Figure 4b; meanwhile, it enhances multi-scale information interaction, with features from different layers deeply aggregated and fused in subfigure (c), enabling the model to handle defects of vastly varying sizes in PCB production; finally, it improves robustness in complex scenarios, reducing interference from background noise in the recognition of minor defects.

Furthermore, the fused features are directed to different prediction branches: subfigure (d) shows the decoupled prediction heads, handling class classification (Class) and bounding box regression (Box) separately, ensuring accurate localization. By optimizing this bidirectional path structure, PAN++ significantly enhances feature representation capability while successfully avoiding excessive redundant computations, effectively maintaining the overall lightweight nature of the model.

The reason why the adapted PAN++ can enhance feature representation without introducing excessive redundant computation lies in its lightweight and task-oriented redesign. The original PAN++ framework contains several text-detection-specific components, such as mask prediction, text kernel segmentation, and boundary regression branches. These components are useful for arbitrary-shaped text detection but are not necessary for bounding-box-based PCB defect detection. Therefore, this paper does not directly adopt the complete PAN++ framework. Instead, only its core bidirectional cross-scale aggregation and feature pyramid enhancement paths are retained, while the task-irrelevant prediction branches are removed.

In terms of feature enhancement, the retained PAN++ path strengthens the bottom-up information flow by propagating shallow high-resolution spatial details to deeper semantic layers. This complements the top-down semantic transmission of FPN and enables the detection head to receive features that contain both fine-grained localization information and high-level semantic discrimination. As a result, tiny defect boundaries, local textures, and global semantic cues can be more effectively integrated across different scales.

In terms of computational efficiency, the proposed FPN-PAN++ does not perform dense all-to-all feature fusion among all pyramid levels. Instead, it mainly conducts adjacent-level feature aggregation through lightweight upsampling, downsampling, concatenation, and convolution operations. This design reuses the existing multi-scale feature maps generated by the YOLOv8 backbone and avoids adding heavy attention modules or extra prediction branches. Therefore, the feature capability is improved by optimizing the information flow rather than simply increasing network depth or width. Thus, the proposed structure effectively improves feature fusion performance with only a small amount of additional computational overhead, while maintaining high inference efficiency.

The enhanced bidirectional feature fusion of FPN-PAN++ is achieved through two complementary information propagation paths. In the top-down FPN pathway, high-level semantic features are progressively upsampled and fused with low-level high-resolution features, so that shallow feature maps obtain stronger semantic guidance for distinguishing true defects from complex circuit textures. In the bottom-up PAN++ pathway, low-level spatial details are further aggregated and propagated back to deeper layers through downsampling and path enhancement, allowing high-level features to recover fine-grained localization cues that may be weakened during repeated downsampling. Therefore, the feature flow is no longer a single semantic transmission from deep to shallow layers, but a bidirectional circulation between semantic-rich deep features and detail-rich shallow features.

Compared with conventional FPN-PAN fusion, the adopted lightweight PAN++ further strengthens cross-scale interaction by enhancing the aggregation paths between adjacent feature levels. This design reduces the semantic gap among multi-scale feature maps and enables the detection head to receive features that contain both global semantic discrimination and local spatial details. For PCB defect detection, this is particularly important because tiny defects usually occupy only a small number of pixels and are easily confused with dense background textures. Through the FPN-PAN++ structure, shallow defect boundaries and textures can be preserved, while deep semantic information can suppress background interference, resulting in more robust multi-scale defect representation.

Previous YOLO-Mamba-based detection methods mainly focus on introducing Mamba modules into the backbone or neck to improve long-range dependency modeling, but they retain the original FPN-PAN structure without improving cross-scale fusion. This paper further redesigns the feature fusion stage by combining C2f-Mamba with a lightweight FPN-PAN++ structure. In other words, previous YOLO-Mamba methods primarily enhance feature extraction, whereas the proposed method simultaneously enhances feature extraction and cross-scale feature interaction. The C2f-Mamba module provides local–global feature representation, while FPN-PAN++ constructs a bidirectional multi-scale information flow. The two modules are complementary: Mamba improves global contextual modeling, and FPN-PAN++ ensures that global context and local defect details are effectively propagated across different scales. To the best of our knowledge, such a combination has not been specifically adopted in previous YOLO-Mamba methods for PCB defect detection.

3.5. Detection Head and Loss Function

This paper continues to use the decoupled detection head structure of YOLOv8, separating the classification task from the regression task in order to improve detection accuracy and training stability.

The detection head mainly consists of two branches: the first is the classification branch, which is used to predict the probability of object categories; the second is the regression branch, which is used to predict the positions of the object bounding boxes. Compared to traditional coupled structures, the decoupled design can reduce interference between tasks, allowing the model to achieve better convergence performance in complex scenarios. Meanwhile, YOLOv8 adopts an anchor-free mechanism, directly predicting the center points of objects, avoiding the hyperparameter dependency issues brought by anchor design, thereby enhancing the model’s generalization capability.

During training, this paper follows the original YOLOv8 design to ensure experimental fairness and stability. The overall loss function consists of classification loss, bounding box regression loss, and distribution focal loss (DFL):

L = L_{c l s} + L_{b o x} + L_{D F L}

(10)

where

L_{c l s}

represents the classification loss, which measures the error in category prediction;

L_{b o x}

represents the regression loss, which constrains the position deviation between the predicted boxes and the ground-truth boxes;

L_{D F L}

refines the bounding box prediction through a distribution-based focal mechanism, replacing the traditional discrete objectness loss.

In bounding box regression, IoU-based loss functions (such as CIoU or SIoU) are commonly used:

L_{b o x} = 1 - C I o U

(11)

Through joint optimization of multiple tasks, the model can simultaneously maintain classification accuracy and localization precision.

3.6. Model Dynamic Training and Reasoning Process

During the model training phase, this study adopts a unified data augmentation and optimization strategy to improve the model’s generalization ability and training stability. The training process mainly includes steps such as data preprocessing, forward propagation, loss calculation, and backpropagation. The detailed architecture is illustrated in Figure 5.

First, the input images are normalized in size and augmented (such as random flipping, cropping, etc.), and then fed into the improved network model. The model extracts multi-scale features through the Backbone, enhances global modeling capabilities via the C2f-Mamba module, and then performs multi-scale feature fusion through the FPN-PAN++ structure, with the detection head finally outputting the prediction results.

During training, the loss function is used to compute the error between the prediction results and the ground truth labels, and the model parameters are updated using the backpropagation algorithm. At the same time, automatic mixed precision (AMP) and learning rate scheduling strategies are introduced to improve training efficiency and accelerate convergence.

During reasoning, the input image only needs to go through a single forward pass to obtain detection results. Subsequently, redundant detection boxes are removed through Non-Maximum Suppression (NMS) to obtain the final detection results.

Thanks to the efficient design of the C2f-Mamba module and FPN-PAN++ structure, the model can maintain high operational speed during inference while ensuring detection accuracy, meeting the real-time requirements of industrial scenarios.

The efficient inference performance is achieved by controlling the computational overhead of both improved modules. In C2f-Mamba, the Mamba branch is introduced through channel splitting, so only part of the features are processed by the global state–space modeling path, while the other branch retains lightweight convolutional extraction. Moreover, Mamba models long-range dependencies with linear complexity, avoiding the quadratic computational cost of Transformer self-attention. In FPN-PAN++, only the core bidirectional feature aggregation paths are retained, and task-irrelevant branches from the original PAN++ framework are removed. The fusion process mainly uses lightweight adjacent-scale operations such as upsampling, downsampling, concatenation, and convolution. Therefore, the proposed model enhances local–global feature representation and multi-scale fusion without introducing excessive computation. Accordingly, the designed structure promotes stronger feature fusion with modest additional computation and preserves efficient inference performance.

4. Experimental Results and Analysis

4.1. Experimental Environment

The experimental platform uses an NVIDIA RTX 4090, Intel Xeon Platinum 8470Q, Ubuntu 22.04. The deep learning framework is PyTorch 2.1.0, Python 3.10, CUDA 12.1. The parameters of the experimental environment are shown in Table 1. All baseline methods are retrained from scratch with the same 100 epochs, input size, optimizer, and data augmentation for full fair comparison.

4.2. Dataset

This study conducts experimental verification using the PCB defect dataset publicly released by Peking University and Kaggle. This dataset contains a total of 1386 PCB defect images, with an average resolution of 2777 × 2138. The dataset originates from industrial visual inspection scenarios and is widely applied in research on printed circuit board defect detection, effectively reflecting the detection requirements in actual production environments. As shown in Figure 6, the dataset includes six typical defect types, such as short, mouse bite, spur, and missing hole, characterized by a high proportion of small targets, uneven class distribution, and complex backgrounds. Among them, small-sized defects occupy only a small pixel area in the images, posing higher demands on the model’s feature extraction capability, while complex circuit texture backgrounds can easily interfere with defect recognition.

In terms of data preprocessing, the original images are first uniformly resized and normalized to ensure consistency of model input; subsequently, data augmentation strategies such as random flipping, random cropping, and color perturbation are introduced to enhance the model’s adaptability to different scenarios; finally, the data are divided into training, validation, and test sets according to a certain ratio to ensure the reliability and fairness of the experimental results.

In terms of data set division, this paper uses the general division strategy of industrial defect detection and randomly divides the PCB defect dataset into a training set, verification set and test set according to the ratio of 8:1:1, and the three sets maintain the same category distribution to avoid the impact of data skew on the experimental results. All experiments were evaluated on the same test set to ensure the fairness of the comparison.

4.3. Evaluation Metric

In order to comprehensively evaluate the performance of the model in the PCB defect detection task, this paper selects commonly used evaluation metrics in the field of object detection, including mean average precision (mAP), precision, and recall.

The mean average precision is used to measure the overall detection capability of the model across all categories and is defined as the mean of the average precision for each category:

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(12)

Here,

N

represents the number of categories, and

A P_{i}

represents the average precision of the i-th category. This metric can comprehensively reflect the model’s detection performance across different categories. Precision is used to measure the proportion of true positives among the samples predicted as positive by the model, and its calculation formula is:

P r e c i s i o n = \frac{T P}{T P + F P}

(13)

Here,

T P

represents true positives, and

F P

represents false positives. The higher the precision, the fewer incorrect detections the model produces.

Recall is used to measure the proportion of actual targets that are correctly detected, and it is defined as:

R e c a l l = \frac{T P}{T P + F N}

(14)

where FN represents false negatives. A higher recall indicates fewer missed detections by the model.

In addition, this paper also uses mAP@0.5 and mAP@0.5:0.95 as supplementary metrics. Specifically, mAP@0.5 is calculated at an IoU threshold of 0.5, while mAP@0.5:0.95 is averaged over multiple IoU thresholds, allowing for a more rigorous evaluation of the model’s overall detection capability.

4.4. Experimental Results

The YOLOv8n+C2f-Mamba+FPN+PAN++ model proposed in this paper outperforms the baseline model and other comparative methods on all evaluation metrics. Compared with the original YOLOv8n model (mAP@0.5 of 97.2% and mAP@0.5:0.95 of 57.5%), the improved model increased to 98.8% and 62.5%, respectively, with the mAP@0.5:0.95 improving by 5 percentage points, indicating that the model has significant advantages under more stringent evaluation conditions.

In order to verify the reliability and effectiveness of the experimental results, the optimal proposed model and all baseline models were tested under the same experimental settings and evaluation criteria. The performance was comprehensively evaluated using multiple common metrics in object detection, including mAP@0.5, mAP@0.5:0.95, precision, and recall. The experimental results demonstrate that the proposed model achieves more competitive and superior performance than other comparison methods across all evaluation indicators. This indicates that the designed model has good stability, strong robustness, and high detection reliability, which further verifies the effectiveness and practicability of the improved strategy in PCB defect detection tasks.

From the perspective of comparative methods, as shown in Table 2, YOLOv6 [42] and YOLOv7 [43] primarily improve detection performance through structural optimization and training strategy enhancements, but they still exhibit certain limitations in detecting small targets and in complex scenarios. YOLOv9 and YOLOv11n [44] further enhance performance, achieving higher Box(P), recall, and mAP50 values, with YOLOv9 reaching 98.2% Box(P) and recall, and YOLOv11n achieving 98.0% Box(P) and 98.4% recall. Moreover, their improvements are particularly notable in the mAP50-95 metric, indicating better performance in challenging detection scenarios. The combined model YOLOv8n+C2f-Mamba+FPN+PAN++ further leverages these advantages, attaining the highest overall detection metrics among the tested methods.

Meanwhile, as shown in Table 3, attention mechanism-based improvements (such as CBAM and CA) enhance detection performance to some extent. Among them, the YOLOv8n+CA model achieves a mAP@0.5 of 98.3%, outperforming the baseline model, although the overall improvement is limited. Transformer-based approaches (such as ViT, DETR, and Swin Transformer), despite possessing some global modeling capability, show inconsistent performance in this task. For example, the YOLOv8n+Swin Transformer model achieves only a mAP@0.5 of 96.9%, even lower than the baseline model, indicating an insufficient adaptability to small, densely distributed targets in complex background scenarios.

In contrast, as shown in Table 4, the method proposed in this paper achieves comprehensive detection performance improvements after introducing the C2f-Mamba module and the FPN-PAN++ structure. Specifically, with only FPN+PAN++ introduced, mAP@0.5 increases to 98.0% and mAP@0.5:0.95 reaches 56.1%; with only the C2f-Mamba module, mAP@0.5 further rises to 98.4% and mAP@0.5:0.95 is 55.7%. When both are combined, the model performance reaches its optimum, indicating a significant synergistic enhancement effect between global modeling and multi-scale feature fusion.

Compared with the YOLO-Mamba series and the generally improved YOLO methods, the proposed model shows obvious advantages. First, previous YOLO-Mamba focuses on natural scenes or infrared images, lacking optimization for PCB small defects and complex textures. Second, general YOLO improvements only use attention or simple feature fusion, lacking global modeling with linear complexity. Third, this work combines C2f-Mamba and FPN-PAN++ for the first time, which not only enhances long-range dependency but also strengthens multi-scale information interaction, leading to higher mAP@0.5:0.95 and better robustness.

In view of the phenomenon that “the performance of a single module decreases slightly, and the performance of a double module combination achieves a significant improvement” in the ablation experiment, this paper carries out a special analysis, and the core causes can be summarized into three points. First, the structure of YOLOv8n is highly sensitive due to its extreme lightweight, with only 3.2 m parameters and extremely low model redundancy. Officials have completed in-depth optimization for the original C2f+FPN-PAN structure. The introduction of a single new module will break the balance of the original converged characteristic flow, causing a marginal negative effect of single module optimization. Second, the two modules have strong complementary closed-loop synergy. C2F-Mamba is responsible for extracting defect features with global context, and FPN-PAN++ is responsible for realizing the two-way cross-scale precise fusion of high-dimensional and low-dimensional features. Both of them jointly build a complete feature extraction fusion output closed-loop, which is indispensable. Only when they work together can they fully release their performance potential, so there is a nonlinear performance boom. The third is the adaptability difference of training hyperparameters. In order to ensure the fairness of comparison, all models adopt unified hyperparameter training. The original hyperparameters only adapt to the original structure of YOLOv8n. After the introduction of a single module, it is easy to converge to the local optimum, while the two modules synchronously reconstruct the characteristic flow, which can adapt the current hyperparameters and converge to the global optimum.

Figure 7 reflects the model’s convergence stability and accuracy performance over 50 training epochs. The six subplots on the left show that the localization, classification, and distribution losses (Loss) on both the training and validation sets exhibit a rapid and steady decline; the four subplots on the right illustrate a consistent increase in precision, recall, and mAP50/mAP50-95 metrics, demonstrating that the model maintains high robustness and detection consistency even under stringent evaluation criteria.

In addition, in terms of the precision and recall metrics, the method proposed in this paper achieved 98.5% and 98.4%, respectively, showing a significant improvement compared with the YOLOv8n model (96.4% and 94.6%), further indicating that this model has a stronger capability in reducing false positives and missed detections. Under complex background conditions, traditional models are easily disturbed by circuit textures, resulting in incomplete detection results or inaccurate localization. However, the method proposed in this paper, by integrating global contextual information with multi-scale features, significantly enhances feature representation, making the detection results more complete and the localization more precise, demonstrating stronger robustness.

Figure 8 presents the ground-truth annotations of representative samples from the PCB defect dataset, where the defect regions are marked with bounding boxes and corresponding class labels. These annotations provide the reference labels for evaluating the detection performance of the proposed model. Based on these ground-truth labels, Figure 9 shows the visual detection results of the improved algorithm on the PCB defect dataset. As depicted, the model can accurately locate various PCB defects, including missing holes, mouse bites, and spurs, with complete detection boxes and consistent class predictions. Even in scenarios with dense circuit textures, tiny defect sizes, or complex backgrounds, the model is still able to accurately identify and enclose all target defects, with high boundary conformity, intuitively demonstrating the excellent defect detection capability and robustness of the proposed method in complex industrial settings.

Meanwhile, Table 5 shows the model complexity and inference speed comparison. The proposed model only increases parameters from 3.20 M to 3.86 M, GFLOPs from 8.7 to 10.2, and maintains 159 FPS, which still meets real-time industrial requirements. All FPS values are tested under the same hardware and software environment: NVIDIA RTX 4090 GPU, Ubuntu 22.04, PyTorch 2.1.0, CUDA 12.1, fixed input resolution 640 × 640, batch size = 16, single-image inference mode without pre-processing and post-processing (NMS) time. The real-time performance is evaluated under standard GPU deployment conditions, which is consistent with common industrial visual inspection testing standards.

4.5. Experimental Conclusion

Through experimental validation on the PCB defect dataset, it can be observed that the C2f-FPN-PAN++-Mamba Improved YOLOv8n model proposed in this paper demonstrates superior performance in PCB defect detection tasks. The experimental results indicate that introducing the C2f-Mamba module can effectively enhance the model’s global feature modeling capability, while the FPN-PAN++ structure significantly improves multi-scale feature fusion. The combination of the two further enhances the model’s detection ability for small objects and complex background scenarios.

Without significantly increasing computational complexity, the proposed method achieves noticeable improvements in detection accuracy, recall, and overall stability, verifying the feasibility and practical value of this model in industrial visual inspection.

In practical factory environments, PCB defect detection is more challenging than standard object detection because defects are usually very small, low-contrast, and easily confused with dense circuit textures. In addition, industrial images may suffer from uneven illumination, slight motion blur, imaging noise, and incomplete defect boundaries caused by high-speed production lines or camera acquisition conditions. These factors make tiny defects such as mouse bites, spurs, open circuits, and missing holes difficult to distinguish from normal PCB patterns, which may lead to missed detections or false alarms.

The proposed C2f-FPN-PAN++-Mamba framework is designed to address these difficult industrial conditions from both feature extraction and feature fusion perspectives. First, the C2f-Mamba module adopts a dual-branch structure, where the convolution branch preserves local edge, texture, and fine-grained spatial details, while the Mamba branch captures long-range contextual dependencies with linear computational complexity. This local–global collaborative representation helps the model distinguish true tiny defects from similar background textures under low-contrast or blurry conditions. Second, the FPN-PAN++ structure strengthens bidirectional multi-scale feature fusion. Low-level spatial details are effectively transmitted to deeper layers, while high-level semantic information is fed back to shallow layers, improving the representation of weak and small defect regions. Therefore, the proposed model can better maintain defect details, suppress background interference, and improve the accuracy of small-defect detection and classification while preserving real-time inference capability for industrial deployment.

The proposed C2f-FPN-PAN++-Mamba framework achieves state-of-the-art performance on the PCB defect dataset with 98.5% precision, 98.4% recall, 98.8% mAP@0.5, and 62.5% mAP@0.5:0.95, surpassing YOLOv8n, attention-based models, and Transformer-based models. The novelty lies in: (1) C2f-Mamba dual-branch design for efficient local–global modeling; (2) lightweight FPN-PAN++ for strengthened bidirectional fusion; (3) the integration that maintains lightweight (3.86M params, 159 FPS). All objectives are reached: small-defect detection improved, false/missed detection reduced, and real-time industrial deployment supported.

Although the proposed model achieves satisfactory overall performance, several failure cases still exist in practical detection. First, extremely tiny and low-contrast defects may be missed due to insufficient local feature saliency. Second, strong background noise may lead to false positives in highly cluttered regions. Third, similar textures between different defect categories may cause occasional misclassification. These limitations are mainly caused by insufficient feature discrimination in challenging regions, which will be addressed in future work by enhancing fine-grained feature representation and noise robustness.

5. Conclusions

This study proposed an improved PCB defect detection model based on C2f-Mamba and FPN-PAN++ to address the challenges of small-defect localization and classification in complex circuit-board backgrounds. By introducing the C2f-Mamba module, the model enhances the joint representation of local texture details and long-range contextual information, which is important for distinguishing tiny defects from dense circuit patterns. Meanwhile, the FPN-PAN++ structure strengthens multi-scale feature fusion and improves the transmission of shallow spatial details and deep semantic information, thereby improving the detection stability of small and weak defect regions.

Experimental results on the PCB defect dataset show that the proposed method achieves better detection accuracy than the baseline model while maintaining a relatively lightweight computational cost. The ablation study further confirms that both C2f-Mamba and FPN-PAN++ contribute positively to the final detection performance. Visual comparisons between ground-truth annotations and predicted results also indicate that the improved model can accurately locate typical PCB defects, such as missing holes, mouse bites, and spurs, under dense textures and complex backgrounds. These results suggest that the proposed method has practical potential for automated PCB quality inspection in industrial production environments.

Nevertheless, this study still has several limitations. The current experiments are mainly conducted on a public PCB defect dataset, and further validation on real factory images with stronger illumination variations, motion blur, and device-dependent imaging noise is still needed. In addition, although the proposed model maintains good detection performance, real-time deployment on edge inspection devices may require further compression. Therefore, future work will focus on model pruning, quantization, and deployment optimization based on the current model structure. Few-shot or domain-adaptive learning will also be explored to improve the model’s adaptability to rare defect types and new production scenarios where labeled samples are limited.

Author Contributions

Conceptualization, methodology, validation, X.H.; formal analysis, X.H., H.W. and Y.S.; investigation, X.H.; resources, X.H. and Y.S.; data curation, H.J.; writing—original draft preparation, H.J.; writing—review and editing, X.H., H.W. and Y.S.; visualization, H.J.; supervision, project administration, X.H., H.W. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available in https://github.com/Ixiaohuihuihui/Tiny-Defect-Detection-for-PCB (accessed on 15 March 2026) and https://www.kaggle.com/datasets/akhatova/pcb-defects/data (accessed on 15 March 2026).

Conflicts of Interest

Author Hao Wang and Yahui Shan were employed by Wuhan Second Ship Design and Research Institute. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Xiao, G.; Hou, S.; Zhou, H. PCB defect detection algorithm based on CDI-YOLO. Sci. Rep. 2024, 14, 7351. [Google Scholar] [CrossRef]
Yu, S.; Pan, F.; Zhang, X.; Zhou, L.; Zhang, L.; Wang, J. A lightweight detection algorithm of PCB surface defects based on YOLO. PLoS ONE 2025, 20, e0320344. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Liu, Y.; Guo, X.; Ling, X.; Geng, Q. Metal surface defect detection using SLF-YOLO enhanced YOLOv8 model. Sci. Rep. 2025, 15, 11105. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Lin, B.; Wei, S.; Wu, J. Small target detection of surface defects on PCB boards: An improved YOLO method integrating attention mechanism and multi-scale feature focusing. AI EDAM 2026, 40, e2. [Google Scholar] [CrossRef]
Sun, Z.; Ma, R.; Lei, Q. Small PCB Defect Detection Based on Convolutional Block Attention Mechanism and YOLOv8. Appl. Sci. 2026, 16, 1078. [Google Scholar] [CrossRef]
Wei, Z.; Yang, F.; Zhong, K.; Yao, L. PCB-YOLO: Enhancing PCB surface defect detection with coordinate attention and multi-scale feature fusion. PLoS ONE 2025, 20, e0323684. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Wang, Y.; Liu, J.; Wu, K.; Abdullahi, H.S.; Lv, P.; Zhang, H. Lightweight PCB defect detection method based on SCF-YOLO. PLoS ONE 2025, 20, e0318033. [Google Scholar] [CrossRef]
Lang, D.; Lv, Z. SEPDNet: Simple and effective PCB surface defect detection method. Sci. Rep. 2025, 15, 10919. [Google Scholar] [CrossRef]
Wang, Y.; Yin, T.; Chen, X.; Zhu, Y.; Wang, J.; Ma, Y.; Liu, L.; Wang, J. SCP-DETR: A efficient small-object-enhanced feature pyramid approach for PCB defect detection. PLoS ONE 2025, 20, e0330039. [Google Scholar] [CrossRef]
Zhang, Z.; Lu, X.; Cao, G.; Yang, Y.; Jiao, L.; Liu, F. ViT-YOLO: Transformer-based YOLO for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2799–2808. [Google Scholar]
Zhao, K.; Lu, R.; Wang, S.; Yang, X.; Li, Q.; Fan, J. ST-YOLOA: A Swin-transformer-based YOLO model with an attention mechanism for SAR ship detection under complex background. Front. Neurorobotics 2023, 17, 1170163. [Google Scholar] [CrossRef]
An, K.; Zhang, Y. LPViT: A transformer based model for PCB image classification and defect detection. IEEE Access 2022, 10, 42542–42553. [Google Scholar] [CrossRef]
Yin, X.; Zhao, Z.; Weng, L. MAS-YOLO: A lightweight detection algorithm for PCB defect detection based on improved YOLOv12. Appl. Sci. 2025, 15, 6238. [Google Scholar] [CrossRef]
Hou, Y.; Zhang, X. A lightweight and high-accuracy framework for Printed Circuit Board defect detection. Eng. Appl. Artif. Intell. 2025, 148, 110375. [Google Scholar] [CrossRef]
Pingzhen, L.; Sheng, X.; Jing, C.; Chengyue, S. Multi-Scale PCB Defect Detection with YOLOv8 Network Improved via Pruning and Lightweight Network. arXiv 2025, arXiv:2507.17176. [Google Scholar] [CrossRef]
Cao, Y. A PCB Micro-Defect Detection Method Based on Mult-Scale Feature Enhancement and Background Suppression. Acad. J. Emerg. Technol. 2026, 2, 29–50. [Google Scholar] [CrossRef]
Zhao, Z.; He, P. Yolo-mamba: Object detection method for infrared aerial images. Signal Image Video Process. 2024, 18, 8793–8803. [Google Scholar] [CrossRef]
Liang, C.; Wang, Z.Z.; Liu, X.L.; Zhang, P.; Tian, Z.W.; Qian, R.L. SDD-Net: A Steel Surface Defect Detection Method Based on Contextual Enhancement and Multiscale Feature Fusion. IEEE Access 2024, 12, 185740–185756. [Google Scholar] [CrossRef]
Zhang, D.; Hao, X.; Wang, D.; Qin, C.; Zhao, B.; Liang, L.; Liu, W. An efficient lightweight convolutional neural network for industrial surface defect detection. Artif. Intell. Rev. 2023, 56, 10651–10677. [Google Scholar] [CrossRef]
Wang, H.; Xie, J.; Xu, X.; Zheng, Z. Few-shot PCB surface defect detection based on feature enhancement and multi-scale fusion. IEEE Access 2022, 10, 129911–129924. [Google Scholar] [CrossRef]
Dang, Z.; Wang, X. FD-YOLO11: A feature-enhanced deep learning model for steel surface defect detection. IEEE Access 2025, 13, 63981–63993. [Google Scholar] [CrossRef]
Zeng, N.; Wu, P.; Wang, Z.; Li, H.; Liu, W.; Liu, X. A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans. Instrum. Meas. 2022, 71, 1–14. [Google Scholar] [CrossRef]
Qian, X.; Wang, X.; Yang, S.; Lei, J. LFF-YOLO: A YOLO algorithm with lightweight feature fusion network for multi-scale defect detection. IEEE Access 2022, 10, 130339–130349. [Google Scholar] [CrossRef]
Huang, H.; Tang, X.; Wen, F.; Jin, X. Small object detection method with shallow feature fusion network for chip surface defect detection. Sci. Rep. 2022, 12, 3914. [Google Scholar] [CrossRef]
Li, Z.; Wei, X.; Hassaballah, M.; Li, Y.; Jiang, X. A deep learning model for steel surface defect detection. Complex Intell. Syst. 2024, 10, 885–897. [Google Scholar] [CrossRef]
Zhang, M.; Hu, Y.; Xu, B.; Luo, L.; Wang, S. DSF-YOLO for weld defect detection in X-ray images with dynamic staged fusion. Sci. Rep. 2025, 15, 23305. [Google Scholar] [CrossRef] [PubMed]
Wu, Y.; Cao, H.; Yang, G.; Lu, T.; Wan, S. Digital twin of intelligent small surface defect detection with cyber-manufacturing systems. ACM Trans. Internet Technol. 2023, 23, 1–20. [Google Scholar] [CrossRef]
Li, D.; Lu, Y.; Gao, Q.; Li, X.; Yu, X.; Song, Y. LiteYOLO-ID: A lightweight object detection network for insulator defect detection. IEEE Trans. Instrum. Meas. 2024, 73, 1–12. [Google Scholar] [CrossRef]
Mo, C.; Hu, Z.; Wang, J.; Xiao, X. SGT-YOLO: A lightweight method for PCB defect detection. IEEE Trans. Instrum. Meas. 2025, 74, 1–11. [Google Scholar] [CrossRef]
Shen, J.; Liu, N.; Sun, H. Defect detection of printed circuit board based on lightweight deep convolution network. IET Image Process. 2020, 14, 3932–3940. [Google Scholar] [CrossRef]
Wang, G.; Chen, J.; Li, C.; Lu, S. Edge-YOLO: Lightweight multi-scale feature extraction for industrial surface inspection. IEEE Access 2025, 13, 48188–48201. [Google Scholar]
Wan, D.; Lu, R.; Hu, B.; Yin, J.; Shen, S.; Lang, X. YOLO-MIF: Improved YOLOv8 with Multi-Information fusion for object detection in Gray-Scale images. Adv. Eng. Inform. 2024, 62, 102709. [Google Scholar] [CrossRef]
Shen, K.; Zhou, X.; Liu, Z. MINet: Multiscale interactive network for real-time salient object detection of strip steel surface defects. IEEE Trans. Ind. Inform. 2024, 20, 7842–7852. [Google Scholar] [CrossRef]
Xu, X.; Jiang, Y.; Chen, W.; Huang, Y.; Zhang, Y.; Sun, X. Damo-yolo: A report on real-time object detection design. arXiv 2022, arXiv:2211.15444. [Google Scholar]
Wang, J.; Dai, H.; Chen, T.; Liu, H.; Zhang, X.; Zhong, Q.; Lu, R. Toward surface defect detection in electronics manufacturing by an accurate and lightweight YOLO-style object detector. Sci. Rep. 2023, 13, 7062. [Google Scholar] [CrossRef]
Yu, X.; Li, H.-X.; Yang, H. Collaborative learning classification model for PCBs defect detection against image and label uncertainty. IEEE Trans. Instrum. Meas. 2023, 72, 1–8. [Google Scholar] [CrossRef]
Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. Yolov9: Learning what you want to learn using programmable gradient information. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2024; pp. 1–21. [Google Scholar]
Jocher, G.; Stoken, A.; Chaurasia, A.; Borovec, J.; NanoCode012; TaoXie; Kwon, Y.; Michael, K.; Liu, C.; Fang, G.; et al. ultralytics/yolov5: V6. 0-Yolov5n’Nano’Models, Roboflow Integration, TensorFlow Export, OpenCV DNN Support, version is 6.0; Zenodo: Geneva, Switzerland, 2021. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In European Conference on Computer Vision; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Chen, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]

Figure 1. Overall architecture of the improved YOLOv8n model based on C2f-Mamba feature extraction and FPN-PAN++ multi-scale fusion. Black arrows represent the sequential data flow throughout the entire network. The backbone and neck incorporates embedded C2f-Mamba modules for enhanced feature extraction. The neck implements a lightweight FPN-PAN++ structure for bidirectional multi-scale feature fusion. The decoupled detection heads generate final classification and bounding box regression outputs.

Figure 2. Schematic diagram of the C2f-Mamba local–global dual-branch parallel structure. Input features are split along the channel dimension into two complementary parallel processing paths. The convolution branch captures local fine-grained spatial details such as defect edges and textures. The Mamba block branch performs efficient long-range global dependency modeling. The outputs of both branches are concatenated to produce the final fused feature representation.

Figure 3. Complete structure and information processing flowchart of the C2f-Mamba module. The input feature map undergoes channel splitting into a direct access branch and a global modeling branch. The global modeling branch consists of a Mamba layer group composed of multiple selective state–space model units. Features from both branches are concatenated and processed through a final convolution layer to achieve local–global collaborative modeling.

Figure 4. Overall architecture of the proposed PAN++-based PCB defect detection network (Adapted from the classic PAN structure). (a) The backbone extracts hierarchical feature maps from the input PCB image, including low-level high-resolution features and high-level semantic features. (b) The PAN++ neck introduces bidirectional feature enhancement paths, where top-down semantic information and bottom-up fine-grained details are progressively aggregated. (c) Multi-scale features are aligned and fused to strengthen the representation of PCB defects with different sizes. (d) The decoupled detection heads separately perform defect classification and bounding box regression, improving localization accuracy for small and complex defects.

Figure 5. Logical flowchart of the dynamic training and inference full process for the improved YOLOv8n model during training, input images undergo preprocessing and data augmentation before being fed into the network. The model extracts features, performs multi-scale fusion, and generates prediction results. Losses are calculated and backpropagated to update model parameters. During inference, input images pass through a single forward pass, and non-maximum suppression is applied to remove redundant detection boxes.

Figure 6. Visualization of six typical defect samples from the PCB defect dataset. The samples include (a) mouse bite, (b) open circuit, (c) short circuit, (d) spur, (e) spurious copper, and (f) missing hole defects. These samples demonstrate the key characteristics of the dataset, including a high proportion of small targets, uneven class distribution, and complex circuit texture backgrounds.

Figure 7. Convergence curves of loss functions and core performance metrics during model training. The curves show the changing trends of bounding box regression loss, classification loss, and distribution focal loss on both training and validation sets over 100 training epochs. They also illustrate the improvement process of precision, recall, mAP@0.5, and mAP@0.5:0.95, verifying the training stability and convergence performance of the model.

Figure 8. Visualization of ground-truth annotations for representative samples from the PCB defect dataset. The annotations include bounding boxes marking the exact locations of defect regions and corresponding category labels. These ground-truth annotations serve as the standard benchmark reference for both quantitative and qualitative evaluation of the model’s detection performance.

Figure 9. Visual detection results of the improved algorithm on PCB defects. The results demonstrate the model’s detection performance in challenging scenarios such as dense circuit textures, tiny-sized defects, and complex background interference. Each detection box is labeled with its confidence score and corresponding defect category, intuitively verifying the model’s recognition capability and localization accuracy.

Table 1. Experimental setup.

Experimental Parameters	Number of Parameters
epochs	100
Image_size	640 × 640
batch	16
device	0
workers	8
learn rate	0.001
optimize	AdamW
momentum	0.937

Table 2. Comparison with the baseline model.

Model	Box(P)/%	Recall/%	mAP50/%	mAP50-95/%
YOLOv6	97.8	97	97.1	57
YOLOv7	96.5	95.8	96.9	54.2
YOLOv8n	96.4	94.6	97.2	57.5
YOLOv9	98.2	98.2	98.8	61.3
YOLOv11n	98.0	98.4	98.7	56.8
YOLOv8n+C2f-Mamba+FPN+PAN++	98.5	98.4	98.8	62.5

Table 3. The comparison experiments of the improved model.

Model	Box(P)/%	Recall/%	mAP50/%	mAP50-95/%
YOLOv8n+CBAM	97.5	97	98.2	55.7
YOLOv8n+CA	96.8	97.1	98.3	55.8
YOLOv8n+VIT	96.4	94.5	97.7	52.8
YOLOv8n+DETR	97.3	96.6	98.1	55.2
YOLOv8n+SwinTransformer	94.6	94.8	96.9	51.9

Table 4. Ablation experiments.

Model	Box(P)/%	Recall/%	mAP50/%	mAP50-95/%
YOLOv8n+FPN+PAN++	97	96.7	98	56.1
YOLOv8n+C2f-Mamba	97	96.8	98.4	55.7
YOLOv8n+C2f-Mamba+FPN+PAN++	98.5	98.4	98.8	62.5

Table 5. Model complexity and inference speed comparison.

Model	Params (M)	GFLOPs	FPS
YOLOv8n	3.2	8.7	186
YOLOv8n+CBAM	3.42	9.1	172
YOLOv8n+CA	3.38	9	175
YOLOv8n+ViT	4.15	12.4	118
YOLOv8n+C2f-Mamba+FPN+PAN++	3.86	10.2	159

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hua, X.; Jiang, H.; Wang, H.; Shan, Y. An Improved YOLOv8n Framework for PCB Defect Detection via C2f-Mamba Feature Extraction and FPN-PAN++ Multi-Scale Fusion. Symmetry 2026, 18, 969. https://doi.org/10.3390/sym18060969

AMA Style

Hua X, Jiang H, Wang H, Shan Y. An Improved YOLOv8n Framework for PCB Defect Detection via C2f-Mamba Feature Extraction and FPN-PAN++ Multi-Scale Fusion. Symmetry. 2026; 18(6):969. https://doi.org/10.3390/sym18060969

Chicago/Turabian Style

Hua, Xuan, Haolin Jiang, Hao Wang, and Yahui Shan. 2026. "An Improved YOLOv8n Framework for PCB Defect Detection via C2f-Mamba Feature Extraction and FPN-PAN++ Multi-Scale Fusion" Symmetry 18, no. 6: 969. https://doi.org/10.3390/sym18060969

APA Style

Hua, X., Jiang, H., Wang, H., & Shan, Y. (2026). An Improved YOLOv8n Framework for PCB Defect Detection via C2f-Mamba Feature Extraction and FPN-PAN++ Multi-Scale Fusion. Symmetry, 18(6), 969. https://doi.org/10.3390/sym18060969

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved YOLOv8n Framework for PCB Defect Detection via C2f-Mamba Feature Extraction and FPN-PAN++ Multi-Scale Fusion

Abstract

1. Introduction

2. Related Work

2.1. Feature Enhancement and Fusion Techniques in Industrial Defect Detection

2.2. Research on Lightweight Object Detection in Industrial Scenarios

3. Materials and Methods

3.1. YOLOv8n Model

3.2. Improve the Overall Design of Our Model Architecture

3.3. C2f-Mamba Module

3.3.1. Principles and Limitations of the Native C2f Module

3.3.2. Basic Principles of the Mamba State–Space Model

3.3.3. C2f-Mamba Module Construction

3.4. FPN + PAN++ Feature Fusion Structure

3.4.1. FPN Feature Fusion Mechanism

3.4.2. Advantages and Structure of PAN++

3.5. Detection Head and Loss Function

3.6. Model Dynamic Training and Reasoning Process

4. Experimental Results and Analysis

4.1. Experimental Environment

4.2. Dataset

4.3. Evaluation Metric

4.4. Experimental Results

4.5. Experimental Conclusion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI