MEE-DETR: Multi-Scale Edge-Aware Enhanced Transformer for PCB Defect Detection

Ma, Xiaoyu; Xie, Xiaolan; Song, Yuhui

doi:10.3390/electronics15030504

Open AccessArticle

MEE-DETR: Multi-Scale Edge-Aware Enhanced Transformer for PCB Defect Detection

by

Xiaoyu Ma

¹,

Xiaolan Xie

^1,2 and

Yuhui Song

^3,*

¹

School of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, China

²

Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin 541006, China

³

School of Environmental Science and Engineering, Guilin University of Technology, Guilin 541006, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(3), 504; https://doi.org/10.3390/electronics15030504

Submission received: 10 December 2025 / Revised: 15 January 2026 / Accepted: 20 January 2026 / Published: 23 January 2026

Download

Browse Figures

Versions Notes

Abstract

Defect inspection of Printed Circuit Board (PCB) is essential for maintaining the safety and reliability of electronic products. With the continuous trend toward smaller components and higher integration levels, identifying tiny imperfections on densely packed PCB structures has become increasingly difficult and remains a major challenge for current inspection systems. To tackle this problem, this study proposes the Multi-scale Edge-Aware Enhanced Detection Transformer (MEE-DETR), a deep learning-based object detection method. Building upon the RT-DETR framework, which is grounded in Transformer-based machine learning, the proposed approach systematically introduces enhancements at three levels: backbone feature extraction, feature interaction, and multi-scale feature fusion. First, the proposed Edge-Strengthened Backbone Network (ESBN) constructs multi-scale edge extraction and semantic fusion pathways, effectively strengthening the structural representation of shallow defect edges. Second, the Entanglement Transformer Block (ETB), synergistically integrates frequency self-attention, spatial self-attention, and a frequency–spatial entangled feed-forward network, enabling deep cross-domain information interaction and consistent feature representation. Finally, the proposed Adaptive Enhancement Feature Pyramid Network (AEFPN), incorporating the Adaptive Cross-scale Fusion Module (ACFM) for cross-scale adaptive weighting and the Enhanced Feature Extraction C3 Module (EFEC3) for local nonlinear enhancement, substantially improves detail preservation and semantic balance during feature fusion. Experiments conducted on the PKU-Market-PCB dataset reveal that MEE-DETR delivers notable performance gains. Specifically, Precision, Recall, and mAP_50–95 improve by 2.5%, 9.4%, and 4.2%, respectively. In addition, the model’s parameter size is reduced by 40.7%. These results collectively indicate that MEE-DETR achieves excellent detection performance with a lightweight network architecture.

Keywords:

Printed Circuit Board; defect detection; RT-DETR; feature fusion; semantic balance

1. Introduction

The Printed Circuit Board (PCB) is found in just about every type of electronic equipment, from small devices such as electronic watches and calculators to large equipment such as computers, communication equipment, and medical devices. Any hardware that contains integrated circuits or electronic components must use a PCB to provide the wiring connections. Consequently, the manufacturing quality of the PCB can be directly related to the performance and reliability of the electronic systems in which it is incorporated [1]. During mass production of PCBs, there are six common defect types, including open circuits (unintended breaks in conductive traces), missing holes (absent or improperly drilled vias or pads), short circuits (undesired electrical connections between conductors), spurious copper (residual copper fragments remaining after etching), mouse bites (irregular notches or gaps along trace edges caused by over-etching), and spur (thin protruding copper branches extending from traces). These PCB defects are often subtle and difficult to detect, yet they can lead to operational failures or long-term reliability issues in electronic products [2]. Therefore, it is essential to develop effective and efficient automated PCB defect detection technologies that will increase production quality and improve overall production output when producing PCB. In parallel, recent studies have highlighted the importance of addressing signal-integrity challenges in high-density multilayer PCBs, for example, through optimization frameworks for high-speed and differential VIA design that aim to reduce impedance discontinuities, signal attenuation, and reflection effects [3,4]. Although these approaches focus on electrical performance rather than visual defect detection, they share the common objective of enhancing the overall reliability and robustness of modern electronic systems. Together, these efforts underscore the growing demand for advanced methodologies throughout the PCB design and manufacturing pipeline.

In the earliest days of PCB manufacturing, defects were detected by manual inspection methods. Workers would inspect solder joints, traces and placements of components on the surface of the PCB with the naked eye or a loupe. Manual inspection is very much subject to the experience and concentration of the operator, both very sensitive to external influences. This laborious work can be time-consuming, and eyes can suffer visual fatigue and even permanent damage [5]. Although manual inspection is feasible for simple printed circuit boards, its shortcomings become more evident as circuit layouts increase in complexity and manufacturing volumes rise. Issues such as slow inspection speed, high labor cost, and a high likelihood of missed defects make manual inspection unsuitable for the high efficiency and precision required in modern manufacturing. In recent years, Automated Optical Inspection (AOI) has been widely adopted for PCB defect detection and can generally be categorized into reference-based, non-reference, and hybrid approaches [6]. AOI systems capture either 2D or 3D images of the surface of PCBs and use image processing algorithms to identify possible defects. Compared to manual inspection, AOI has a higher detection efficiency, and needs less human involved. However, AOI systems can be affected by changes in the environment, and these systems are limited to relatively simple types of defects. Moreover, AOI equipment is expensive, meaning more costs in production, and the performance can degrade when inspecting more complex PCB designs.

Traditional image processing techniques and machine learning methods have also been used for PCB defect inspection. Classical image processing methods such as template matching and wavelet transforms are typically based on explicit visual features—such as edges, geometric shapes, colors—followed by separate classification stages. Their heavy dependence on local features is accompanied by various downsides such as poor noise tolerance, lack of stability and slow processing speed. Machine learning techniques such as SVM [7], Random Forests [8], and Decision Trees [9] have improved detection accuracy to some extent. Nevertheless, they depend on manually crafted feature representations and involve relatively complex model structures, resulting in low inference efficiency. These limitations make such methods difficult to deploy in industrial scenarios that require high performance while maintaining real-time capability.

As deep learning technology advances rapidly, object detection models built upon convolutional neural networks are extensively utilized in PCB defect inspection owing to their robust capability for feature extraction. Representative approaches include two-stage detectors such as Faster R-CNN [10] and Detectors [11], as well as one-stage models like SSD [12] and the YOLO series [13]. These methods have achieved remarkable performance on large-scale datasets and have greatly advanced automated inspection technology. However, CNN-based detectors rely on non-maximum suppression (NMS) for post-processing [14], which introduces additional inference latency and prevents a fully end-to-end detection pipeline. Moreover, their inherently local receptive fields limit feature representation, making it difficult to effectively capture the multi-scale structures and weak-texture characteristics commonly found in PCB defects [15]. In comparison, detection models leveraging Transformers render NMS unnecessary. Among them, RT-DETR [16] has surpassed classical YOLO models in accuracy and achieved high-precision, end-to-end real-time detection. Despite these advantages, RT-DETR remains challenged by several constraints, such as inadequate capability to extract fine-grained details of small defects, reduced discrimination of weak-texture regions under complex backgrounds, and relatively high model complexity. To address these challenges, we propose MEE-DETR, a high-accuracy and lightweight PCB defect detection framework. To comprehensively validate the effectiveness of the proposed MEE-DETR under realistic industrial conditions, we conduct experiments on the PKU-Market-PCB dataset, also known as HRIPCB [17]. This dataset is collected from real PCB production lines using high-resolution industrial imaging systems and is specifically designed to reflect practical inspection scenarios. Compared with commonly used synthetic or laboratory-constructed PCB datasets, HRIPCB contains defects that are extremely small in size, sparsely distributed, and often embedded in complex backgrounds with weak texture contrast. Moreover, different defect categories in HRIPCB exhibit subtle visual differences and significant class imbalance, which pose substantial challenges to fine-grained feature extraction, multi-scale representation, and robust discrimination. These characteristics make PKU-Market-PCB a highly representative benchmark for evaluating the capability of advanced detection models to handle small, low-contrast PCB defects in real-world manufacturing environments. This work makes the following key contributions:

(1): We designed the Edge-Strengthened Backbone Network (ESBN). By leveraging a multi-scale edge information extraction mechanism and an edge-semantic fusion module, ESBN enhances the model’s ability to capture fine-grained structural features and improves its representation of weak-texture regions, small objects, and complex contours. Moreover, it significantly reduces the number of parameters.
(2): We introduce the Entanglement Transformer Block (ETB), a frequency–spatial feature interaction module. Through joint modeling in both the frequency and spatial domains, ETB strengthens energy distribution contrasts and local texture representation in defect regions, thereby improving the robustness and discriminative ability of the model when detecting small PCB defects under complex backgrounds.
(3): We propose AEFPN, an Adaptive Enhancement Feature Pyramid Network. Using cross-scale weighted fusion and local texture enhancement strategies, AEFPN alleviates the attenuation of shallow-layer details and enables more stable small-object feature propagation, thereby enhancing the model’s ability to detect minute defects in complex PCB layouts.

The remainder of this paper is organized as follows. Section 2 reviews related work on PCB defect detection and relevant deep learning-based approaches. Section 3 presents the motivation and detailed design of the proposed MEE-DETR, including the key architectural improvements. Section 4 reports comprehensive experimental results and visualization analyses to evaluate the effectiveness of the proposed method. Finally, Section 5 concludes the paper and outlines potential directions for future research.

2. Related Work

2.1. CNN-Based Defect Detection Methods

CNN-based techniques have been widely applied to PCB defect detection in real manufacturing environments, and existing approaches can generally be categorized into two groups: two-stage detectors (e.g., Faster R-CNN) and one-stage detectors (e.g., the YOLO series) [18]. One-stage detectors perform classification and localization within a unified architecture and eliminate the region proposal stage, thus allowing direct prediction of both bounding boxes and their categories based on an input image. As a result, they achieve significantly higher inference speed than two-stage methods [19], which has made YOLO-style one-stage models the mainstream solutions for PCB defect detection. For instance, Zhou et al. introduced MSD-YOLO [20], which combines MobileNet-V3 with CSPDarknet-53 to compress the network and applies attention modules to strengthen feature learning. In addition, the model adopts a redesigned head in which the branches for categorization and spatial localization are optimized through independent learning pathways. Tang et al. introduced PCB-YOLO [21], a real-time PCB defect detection algorithm that leverages K-means++ to generate anchor boxes better suited for PCB datasets. Li et al. developed HSD-YOLO [22], a lightweight and high-precision detector based on an improved YOLOv8 architecture. Their approach employs HGNetv2 as the backbone network and substitutes original convolutions with GSConv within feature fusion layers to reduce computational cost. Additionally, the DyHead module is integrated to strengthen the representational capability of the detection head. Zhang et al. introduced LDD-Net [23], a computationally efficient PCB defect inspection algorithm that integrates a novel lightweight feature extraction module with a multi-scale aggregation network to extract defect features and promote information sharing across feature maps of different scales. Although these CNN-based methods achieve strong detection performance, they typically rely on non-maximum suppression (NMS) during the post-processing stage. Consequently, when dealing with dense overlapping targets or complex distributions of small defects, NMS may introduce redundant suppression errors and increase inference latency, thereby affecting the consistency and stability of detection outputs. In addition, the inherent limitations of convolutional structures in cross-feature information modeling make CNNs prone to missing tiny defects [24]. The emergence of Transformer-based detection models effectively addresses these issues.

2.2. Transformer-Based Defect Detection Methods

In 2017, Google introduced the Transformer model, an efficient approach for natural language processing [25]. By incorporating a self-attention mechanism and removing the sequential structure of traditional RNN-based models, Transformers support parallel training and capture global contextual information more effectively. Researchers subsequently extended Transformer architectures to object detection and proposed Detection Transformer (DETR) [26]. Experiments demonstrated that DETR could maintain competitive detection accuracy while substantially reducing the need for handcrafted components compared with traditional CNN-based methods.

Driven by this breakthrough, a growing number of studies have explored applying Transformers to object detection. Being the pioneering end-to-end detector utilizing Transformers for both encoding and decoding components, DETR effectively eliminated the dependency on NMS post-processing while representing a significant milestone within this domain [27]. The innovation inspired numerous follow-up improvements. For example, Deformable DETR, proposed by Zhu et al. [28], integrates deformable convolution with attention mechanisms, thus improving adaptability across scenarios while accelerating convergence. Yao et al. introduced Efficient DETR [29], which adopts an optimized attention mechanism to reduce model complexity while integrating a modified feature pyramid network for improved multi-scale feature fusion. Meng et al. introduced Conditional DETR [30], which incorporates conditional decoder attention to optimize query generation and improve localization accuracy. Despite these advances, DETR faces several challenges in real-world applications, including lengthy training processes, limited detection accuracy for small targets, and considerable computational cost. These challenges have motivated subsequent research to explore various optimization directions.

Zhao et al. proposed RT-DETR [31], a transformer-based model that achieves real-time performance without requiring non-maximum suppression (as needed in conventional YOLO detectors) and with significantly lower computational overhead than other DETR variants. RT-DETR redesigns the transformer backbone by processing features within each scale independently before merging them across scales, resulting in markedly faster inference. It selects queries based on uncertainty estimates obtained during denoising training, which substantially improves detection precision. Furthermore, the model supports changing the number of decoder layers at inference time without any additional training. Following RT-DETR, many improved variants have been proposed. Ji et al. introduced MS-DETR [32], which enhances feature extraction using multi-stage convolutional modules while reducing model complexity. Liu et al. introduced Bearing-DETR [33] upon the RT-DETR framework. It incorporates learnable upsampling stages, parameter-efficient meta-mobile blocks, and large-kernel deformable attention, thereby improving detection effectiveness while lowering computational overhead. Although these Transformer-based improvements have made notable progress, several challenges remain. First, shallow edge and texture features lack explicit modeling, causing fine structures to be excessively smoothed during deeper semantic extraction. Second, repeated downsampling produces overly dominant high-level semantics while compressing local details, leading to the disappearance of small objects in deeper layers. Third, Transformer feature interaction relies solely on the spatial domain and cannot capture discriminative energy and texture irregularities that exist in the frequency domain. Fourth, the association between local textures and global PCB layouts remains insufficient, limiting robustness in complex backgrounds. Finally, traditional FPN structures often weaken low-level information during multi-scale fusion, creating a “strong semantics but weak details” problem that ultimately compromises the representation of small defects. To address these issues, this study proposes MEE-DETR, a high-accuracy and lightweight framework built upon RT-DETR.

3. Methodology

3.1. The Proposed MEE-DETR Model

This study proposes a high-accuracy and lightweight framework, MEE-DETR. Figure 1 illustrates the overall structure of the proposed MEE-DETR framework, which comprises the Edge Strengthened Backbone Network (ESBN), Entanglement Transformer Block (ETB), Adaptive Enhancement Feature Pyramid Network (AEFPN), and a decoding-based detection head. In practical PCB inspection scenarios, defect detection faces several inherent challenges, including extremely small defect sizes, high visual similarity among different defect categories, and strong background interference caused by complex circuit layouts. These factors make defects difficult to distinguish from surrounding structures and easily lead to missed or false detections. To explicitly address these challenges, the proposed MEE-DETR adopts a targeted and modular design: ESBN focuses on enhancing fine-grained edge representations to improve the visibility of small defects; ETB leverages frequency–spatial collaborative modeling to strengthen discriminative feature learning for visually similar and low-contrast defects; and AEFPN is designed to suppress background interference and alleviate semantic imbalance during multi-scale feature fusion. These components operate synergistically to improve the detection of small defects, and their respective contributions are outlined below.

ESBN: To improve the ability to model defect edge characteristics, we design a new backbone network, ESBN. It includes a Multi-Scale Edge Extraction Module (MSEEM), which produces a set of multi-scale edge response maps from shallow features, and an Edge Semantic Fusion Module (ESFM), which injects the edge response information into the semantic feature at the same scale, achieving deep fusion between shallow edge features and deeper semantic representations. Furthermore, as illustrated in Figure 2A, considering that PCB defects are typically small and that high-level features tend to lose crucial details, ESBN removes the original P5 scale and retains only P3 and P4 for detection, thereby substantially improving the representation of small defects and their edge details.
ETB: To further enhance the model’s representational capacity during feature interaction, we introduce the Entanglement Transformer Block (ETB) [34] to construct more discriminative cross-domain feature representations. Since PCB defects typically exhibit slight grayscale fluctuations and local texture disturbances that are often inconspicuous in the spatial domain but become more prominent in the frequency domain, ETB integrates frequency self-attention (FSA) and spatial self-attention (SSA) to enable effective information exchange between global spectral cues and local spatial structures. Additionally, a frequency–spatial entangled feed-forward network (EFFN) is employed to deeply fuse these two domains. By leveraging this dual-domain collaborative modeling strategy, ETB substantially enhances the network’s capability to identify minute and low-contrast targets.
AEFPN: To address the attenuation of shallow-level information and semantic imbalance in the feature fusion stage, we propose the Adaptive Enhancement Feature Pyramid Network (AEFPN). AEFPN incorporates two key modules: an Adaptive Cross-scale Fusion Module (ACFM) and an Enhanced Feature Extraction C3 Module (EFEC3). ACFM employs a dynamic channel recalibration mechanism to adaptively assign weights across different feature levels, ensuring balanced contributions from high-level semantics and shallow structural details. EFEC3 embeds an Intensity Enhancement Fusion Unit (IEFU), which enhances edge and texture responses within local regions, effectively mitigating feature interference from complex circuit backgrounds. Through their collaborative effects, AEFPN achieves robust cross-scale feature interactions while enhancing the model’s overall ability to detect small PCB defects.

3.2. ESBN

RT-DETR adopts ResNet-18 as its backbone for feature extraction. Although this network is capable of capturing multi-scale semantic information, it still exhibits two evident limitations for PCB defect detection. First, the edge information in shallow features is not effectively propagated to deeper layers, thereby constraining the localization accuracy of defect bounding boxes. Second, repeated downsampling in the backbone network drastically reduces the spatial resolution of deep features, leading to severe loss of fine-grained details such as edges and small structures. Since PCB defects often manifest as subtle edge discontinuities or localized structural anomalies, this resolution degradation—combined with an overreliance on large receptive fields—makes it difficult to preserve critical spatial detail while maintaining strong semantic representation. Consequently, effectively retaining and leveraging edge information throughout the feature hierarchy remains a key challenge for accurate PCB defect detection. To address these issues, we remove the P5 layer—which not only fails to preserve meaningful edge information but also introduces redundant computation that degrades detection efficiency—and redesign the backbone using the proposed Multi-Scale Edge Extraction Module (MSEEM) and Edge Semantic Fusion Module (ESFM). By explicitly modeling edge information and embedding it into the backbone’s multi-scale feature hierarchy, ESBN markedly improves the model’s sensitivity to the boundary characteristics of PCB defects.

3.2.1. MSEEM

The architecture of MSEEM is illustrated in Figure 1B. It first uses the Sobel-based Edge Extraction Unit (SEEU) to extract the initial edge map E₀ from shallow feature maps. As illustrated in Figure 1G, the SEEU incorporates Sobel convolution kernels within a per-channel 2D convolution framework to compute gradient responses in both the horizontal and vertical directions, and the resulting responses are then summed to produce the final edge response representation. This explicit edge extraction mechanism strengthens the network’s sensitivity to defect contours while maintaining channel independence. Subsequently, MSEEM generates multi-resolution edge features (E₁, E₂) through max-pooling operations with different pooling depths, followed by

1 \times 1

convolutions at each scale to align the channel dimensions with those of the backbone features. Max-pooling is chosen instead of average pooling because it better preserves the strongest local responses, thereby highlighting edge structures while avoiding excessive smoothing. Built upon shallow features, MSEEM provides enhanced multi-scale edge representations, enabling the network to maintain robust boundary perception across defects of varying sizes.

3.2.2. ESFM

The Edge Semantic Fusion Module (ESFM) is designed to achieve deep integration of edge information and semantic features, as shown in Figure 1C. Specifically, ESFM first concatenates the semantic feature F extracted from the backbone with the corresponding scale edge feature E_i along the channel dimension to obtain a joint feature representation:

Z = C o n c a t (F, E_{i})

(1)

Subsequently, the ESFM applies a series of cascaded convolutional operations to compress and reconstruct the fused features, enabling efficient integration of multi-source information. It first adopts 1 × 1 convolution for compressing channel dimensionality, which lowers computational cost. This is followed by a

3 \times 3

convolution to extract local contextual information from the fused representation, thereby strengthening the interaction between edge cues and semantic features. Finally, another

1 \times 1

convolution restores the representation to the target channel dimension, producing the fused output feature map:

F^{'} = C o n v 1 \times 1 (C o n v 3 \times 3 (C o n v 1 \times 1 (Z)))

(2)

Through this staged convolutional fusion design, the ESFM achieves effective coupling between edge and semantic information. It preserves a broad semantic receptive field while substantially enhancing the network’s sensitivity to edge regions, thereby providing stronger structural support for accurate defect localization.

The Edge Strengthened Backbone network effectively compensates for the limitations of RT-DETR in exploiting edge information. Specifically, ESBN extracts explicit edge information from shallow features and propagates them forward, thereby reducing the model’s reliance on high-level features with large receptive fields and preventing fine-grained edge details of small defects from being excessively diluted during downsampling. The multi-scale edge features generated by MSEEM enable the network to attend to richer boundary information, enhancing its robustness in detecting defect targets of varying sizes. Under the combined effects of channel compression and local context modeling, ESFM achieves deep fusion of edge and semantic representations, substantially improving the model’s ability to perceive defect boundaries, especially in cases of blurred contours. Overall, the proposed Edge Strengthened Backbone Network enhances localization accuracy for minute defects and markedly lowers the model’s parameter complexity.

3.3. ETB

In the original RT-DETR, the Attention-based Intra-scale Feature Interaction (AIFI) module is designed to perform feature interaction and dynamic modeling through an attention mechanism. However, AIFI models features solely in the spatial domain, making it insufficient to capture energy distributions and high-frequency texture cues present in the frequency domain. This single-domain interaction scheme becomes limiting when dealing with PCB images that exhibit complex texture structures. In PCB defect detection, the weak spectral responsiveness of AIFI often leads to the loss of fine-grained details and attenuation of cross-scale information. To overcome these limitations, we introduce the Entanglement Transformer Block (ETB), which enables joint frequency–spatial dual-domain feature fusion and dynamic entanglement learning. The core idea of ETB is to model feature dependencies simultaneously in both the frequency domain and the spatial domain. This is achieved through the coordinated operation of three components—Frequency Self-Attention (FSA), Spatial Self-Attention (SSA), and the Entanglement Feed-Forward Network (EFFN)—which jointly enable cross-domain feature fusion and reconstruction. The overall structure of ETB is illustrated in Figure 1F.

ETB first applies layer normalization to obtain the normalized representation

\hat{X}

, after which the frequency-domain and spatial-domain attention responses are computed in parallel. In the frequency-domain branch, ETB employs the Fast Fourier Transform (FFT) to project the features into the spectral space, where a complex-valued self-attention mechanism is used to learn the correlations and energy distribution among different frequency bands. A compact formulation of the frequency-domain attention is given as

Y_{f} = F^{- 1} (Θ (S o f t m a x (R (Q_{f} K_{f}^{H})), S o f t m a x (I (Q_{f} K_{f}^{H}))) V_{f})

(3)

where

F (\cdot)

and

F^{- 1} (\cdot)

denote the FFT and IFFT, respectively;

Q_{f}

,

K_{f}

, and

V_{f}

represent the query, key, and value matrices in the frequency domain;

R (\cdot)

and

I (\cdot)

extract the real and imaginary components of a complex matrix, enabling the separation of magnitude and phase information in the complex domain; and

Θ (\cdot, \cdot)

denotes the fusion function that recombines the activated real and imaginary parts into complex-valued attention weights. This complex-domain attention mechanism performs separate normalization on the real and imaginary components, thereby explicitly modeling the global distribution of spectral energy across different frequency bands. Thereby, it captures long-range dependencies across scales and channels in the frequency dimension. Once the inverse Fourier transform is applied, projecting the representation back into the spatial domain, the resulting features not only preserve globally consistent frequency characteristics but also capture stronger structural and textural details and thus achieve adaptive enhancement of frequency-domain features.

To further clarify the practical implementation of the proposed frequency-domain self-attention, the following operations are performed in the spectral space. Given an input feature map

X \in R^{C \times H \times W}

, we first transform it into the frequency domain using the 2D Fast Fourier Transform (FFT):

\begin{matrix} X_{f} = F (X), \end{matrix}

(4)

where

X_{f} \in C^{C \times H \times W}

is a complex-valued representation containing both magnitude and phase information.

To model global correlations among different frequency components, FSA adopts a multi-head self-attention mechanism directly in the frequency domain. Specifically, the query, key, and value tensors are obtained as Q_f = K_f = V_f = X_f, which are then reshaped into N_h heads and normalized along the frequency dimension. The complex-valued attention matrix is computed as,

\begin{matrix} A_{f} = \frac{Q_{f} K_{f}^{H}}{\sqrt{d}} \cdot τ \end{matrix}

(5)

where

(\cdot)^{H}

denotes the Hermitian transpose, d is the channel dimension per head, and τ is a learnable temperature parameter. To ensure numerical stability and preserve complex-domain characteristics, we perform separate softmax normalization on the real and imaginary parts of the attention matrix:

\begin{matrix} A_{f}^{n o r m} = C (S o f t m a x (R (A_{f})), S o f t m a x (I (A_{f}))) \end{matrix}

(6)

where

R (\cdot)

and

I (\cdot)

denote the real and imaginary components, respectively, and

C (\cdot, \cdot)

recombines them into a complex-valued attention weight. The attended frequency features are then obtained by

\begin{matrix} Y_{f} = F^{- 1} (A_{f}^{n o r m} V_{f}) \end{matrix}

(7)

and the magnitude of the inverse FFT result is taken to produce real-valued features. In addition to the global frequency self-attention, we introduce a local frequency enhancement branch, where the real part of the frequency coefficients is used to generate adaptive channel-wise weights that modulate the spectral response. The outputs of the global and local frequency branches are concatenated and projected through a 1 × 1 convolution to produce the final frequency-attended feature map. Through this design, FSA explicitly captures long-range dependencies and energy redistribution across frequency bands, while retaining local structural cues, thereby providing a complementary representation to spatial-domain attention.

From a signal-processing perspective, PCB defect patterns such as spur, spurious copper, and mouse bites often manifest as localized structural irregularities superimposed on relatively regular circuit layouts. In the frequency domain, these irregularities correspond to distinctive high-frequency components and abnormal energy distributions across specific spectral bands, while the repetitive background structures of PCB traces are mainly characterized by more regular and low-frequency responses. By introducing frequency-domain self-attention, the proposed ETB explicitly models the global distribution of spectral energy and the correlations among different frequency bands. This enables the network to selectively emphasize frequency components that are more sensitive to defect-induced structural disruptions, while suppressing redundant or repetitive background patterns. In contrast to purely spatial attention, which operates on local neighborhoods, frequency-domain attention provides a global receptive field in the spectral space, allowing long-range dependencies and cross-scale interactions to be captured more effectively. Consequently, the integration of frequency-domain attention is particularly beneficial for enhancing small, low-contrast PCB defects whose visual signatures may be weak in the spatial domain but become more distinguishable when analyzed through their spectral characteristics.

The spatial branch, SSA, focuses on local context modeling. Given the input

\hat{X}

, it first applies a

1 \times 1

convolution to encode positional information, followed by

3 \times 3

and

5 \times 5

depthwise separable convolutions to generate multi-scale query, key, and value representations. The spatial-domain attention is then computed as

Y_{s} = {C o n v}_{1 \times 1} (C a t (S o f t m a x (Q_{s} K_{s}^{T}) V_{s}, {D W C o n v}_{3 \times 3} (X)))

(8)

where

C a t (\cdot)

represents channel concatenation. This mechanism leverages convolution kernels of different receptive fields to capture multi-scale contextual information in the spatial domain, thereby enhancing boundary and texture details. Consequently, it compensates for the limited local representational capacity of frequency-domain modeling.

To enable deep interaction between frequency-domain and spatial-domain representations, ETB incorporates the Entanglement Feed-Forward Network. EFFN first fuses the frequency-domain output

Y_{f}

with the spatial-domain output

Y_{s}

to form the initial coupled representation

X_{c}

. It then performs nonlinear gating and convolutional operations separately in the frequency and spatial domains to facilitate complementary learning across the two domains.

X_{c} = L N (Y_{f} + Y_{s})

(9)

{\hat{X}}_{f} = G E L U (Φ |σ (F (X_{c})) ⊙ F (X_{c})|) ⊙ Φ |σ (F (X_{c})) ⊙ F (X_{c})|

(10)

{\hat{X}}_{s} = G E L U ({D W C o n v}_{3 \times 3} (X_{c})) ⊙ {D W C o n v}_{3 \times 3} (X_{c})

(11)

where

L N (\cdot)

denotes layer normalization,

σ (\cdot)

is the Sigmoid gating function, and

Φ | \cdot |

represents the magnitude operator for complex-valued inputs. Finally, the output is obtained through an additional FFT–IFFT interaction followed by channel compression:

Y_{E T B} = {C o n v}_{1 \times 1} (C a t (Φ |F^{- 1} (F ({\hat{X}}_{f} ∥ {\hat{X}}_{s}))|, X)) + X

(12)

EFFN achieves entangled learning by reweighting features in the frequency spectrum and integrating them with spatial-domain convolutions, thus allowing the network to maintain sensitivity to fine-grained structural details and preserve global semantic consistency.

In summary, the ETB module offers three key advantages over the original AIFI: (1) Dual-domain collaborative modeling: ETB introduces, for the first time within the RT-DETR architecture, a joint modeling mechanism that simultaneously captures feature dependencies in both the frequency and spatial domains. (2) Complex-domain attention mechanism: through complex-valued normalization and Fourier-based attention, ETB endows the model with explicit frequency selectivity, substantially improving its sensitivity to high-frequency defect patterns. (3) Efficient entangled fusion: by employing lightweight gated convolutions, ETB achieves dynamic coupling between the frequency and spatial domains, effectively balancing inference efficiency and detection accuracy. Experimental results demonstrate that incorporating ETB significantly enhances the robustness and discriminative capability of the improved model in detecting small-scale defects on complex PCB surfaces.

3.4. AEFPN

RT-DETR leverages the Transformer to perform global feature modeling and object query matching, resulting in strong representation power and excellent spatial modeling capability. However, its neck still adopts the conventional FPN architecture for multi-scale feature integration, which tends to be less effective when dealing with small and low-contrast defects. First, the top-down hierarchical fusion scheme in FPN causes shallow features to undergo progressive semantic dilution as they propagate toward deeper layers, making it difficult to preserve fine-grained details. This issue is particularly pronounced in small-object detection, where the loss of shallow spatial information leads to suboptimal localization of subtle PCB defects. Second, FPN relies on fixed channel-wise weighting or simple concatenation during feature fusion, lacking an adaptive mechanism for assessing the relative importance of features at different stages. When features from heterogeneous resolutions are combined with equal weights, the semantic energy distribution becomes imbalanced, causing gradient updates to concentrate primarily on high-level features while low-level cues are suppressed.

3.4.1. ACFM

To address these issues, we propose the Adaptive Cross-scale Fusion Module (ACFM) and the Enhanced Feature Extraction C3 Module (EFEC3), and redesign the RT-DETR neck accordingly, forming the Adaptive Enhancement Feature Pyramid Network (AEFPN). The ACFM introduces a multi-input adaptive weighting strategy that dynamically balances contributions from different stages, thereby enabling semantic re-calibration across scales. The EFEC3 employs a dual-branch nonlinear enhancement mechanism within each fusion node to strengthen detail representation and preserve texture information. Working in concert, these two modules enable AEFPN to substantially improve cross-scale feature interaction and intra-level representation balance, while maintaining the global modeling advantages of RT-DETR. This enhanced neck design provides a more robust structural foundation for PCB defect detection tasks.

The ACFM is designed to perform adaptive fusion and weighted re-calibration across multi-level features. Its core idea is to introduce a cross-level channel attention mechanism among the input multi-scale features, thereby achieving dynamic semantic balancing across scales. As illustrated in Figure 1D, the module receives a set of input features

{F_{1}, F_{2}, \dots, F_{i}}

, first normalizes their channel dimensions using a

1 \times 1

convolution, and then aggregates them into a unified representation through feature concatenation. The concatenated representation is then processed sequentially by average pooling, an MLP, and a Softmax layer to generate a dynamic weight matrix A for each feature level. Finally, the input features are adaptively modulated by A to produce the fused output

F_{o u t}

, as shown in Equations (9) and (10), where

⊙

denotes channel-wise weighting. The ACFM preserves the independence of features from different hierarchy levels during fusion, while its learned weighting mechanism adaptively modulates their contributions, effectively preventing high-level representations from disproportionately dominating the fused output.

A = S o f t m a x (M L P (A v g P o o l (\sum_{i = 1}^{h} F_{i})))

(13)

F_{o u t} = \sum_{i = 1}^{h} A_{i} ⊙ F_{i}

(14)

3.4.2. EFEC3

In PCB defect detection, target instances are often extremely small, and the conventional C3 module produces insufficient activation on such fine structures, thereby weakening the model’s ability to discriminate defect boundaries. The Enhanced Feature Extraction C3 (EFEC3) module addresses this issue by introducing a dual-branch intensity enhancement mechanism that establishes a dynamic balance between “salient-region response amplification” and “background noise suppression’’ in the feature space. This design enables the network to preserve global semantic consistency while substantially strengthening its sensitivity to local details. As illustrated in Figure 1E, the EFEC3 module adopts a two-branch structure. The main branch stacks three Intensity Enhancement Fusion Units (IEFU) to progressively amplify the input features, whereas the bypass branch performs a linear projection to ensure stable propagation of low-frequency semantics. Let the input feature be

X \in R^{B \times C_{1} \times H \times W}

, with

B

representing the batch size,

C_{1}

indicating the channel dimension, and

H, W

denoting the spatial height and width. The EFEC3 operation is expressed as follows:

P_{m} = {Conv}_{1 \times 1}^{(1)} (X), P_{s} = {Conv}_{1 \times 1}^{(2)} (X)

(15)

Y_{m} = {IEFU}^{(3)} ({IEFU}^{(2)} ({IEFU}^{(1)} (P_{m})))

(16)

Y = {Conv}_{1 \times 1}^{(3)} (Y_{m} + P_{s})

(17)

where

{C o n v}_{1 \times 1}^{(i)} (\cdot)

denotes the

1 \times 1

convolution used for channel alignment and linear projection;

Y_{m}

represents the features enhanced through multiple stacked IEFU stages, and

P_{s}

denotes the bypass residual information. After fusing

Y_{m}

with

P_{s}

, the fused feature representation is refined by a convolutional operation that restores the channels to the target dimensionality, yielding the final output Y. This design follows the Cross-Stage Partial paradigm, where residual compensation mitigates the representational drift potentially introduced by deep enhancement operations, thereby stabilizing the refinement process. Moreover, stacking multiple IEFUs establishes a progressive intra-stage feature–focusing mechanism, effectively amplifying the response intensity of subtle defect regions.

The Intensity Enhancement Fusion Unit (IEFU) serves as the core submodule of EFEC3, responsible for performing nonlinear feature-intensity amplification and salient-region enhancement within the local spatial domain. As illustrated in Figure 1H, its computational process is formulated as follows:

Z = {C o n v}_{1 \times 1} (X) \in R^{B \times 2 C^{'} \times H \times W}

(18)

(U_{1}, U_{2}) = s p l i t ({D W C o n v}_{3 \times 3} (Z)), U_{j} \in R^{B \times C^{'} \times H \times W}

(19)

{\tilde{U}}_{1} = U_{1} + \tanh ({D W C o n v}_{3 \times 3}^{(1)} (U_{1}))

(20)

{\tilde{U}}_{2} = U_{2} + \tanh ({D W C o n v}_{3 \times 3}^{(2)} (U_{2}))

(21)

F = {\tilde{U}}_{1} ⊙ {\tilde{U}}_{2}

(22)

Y_{I E L} = {C o n v}_{1 \times 1} (F) \in R^{B \times C_{1} \times H \times W}

(23)

where Conv_1×1 denotes a

1 \times 1

convolution used for channel expansion or compression,

{DWConv}_{3 \times 3}

represents a depthwise separable convolution for extracting local spatial information,

\tanh (\cdot)

is the hyperbolic tangent activation function that introduces nonlinear expressiveness,

⊙

denotes element-wise multiplication,

C' = ⌊ α C_{1} ⌋

is the expanded channel dimension, and α is the expansion ratio. In terms of computational logic, the IEFU first increases the feature capacity through channel expansion, after which depthwise convolutions generate two complementary feature branches

(U_{1}, U_{2})

. Each branch performs an independent convolution-and-activation enhancement and incorporates a residual connection to preserve the stability of the original semantic content. Finally, combining the outputs of the dual branches through a pointwise product yields a gating mechanism that amplifies responses only when both branches are simultaneously activated, thus achieving adaptive enhancement of salient regions.

In implementation, the channel expansion ratio α is fixed to 2.66 throughout all experiments, following common practice in lightweight feed-forward enhancement modules. Given an input feature map

X \in R^{B \times C_{1} \times H \times W}

, the expanded hidden dimension is computed as

C^{'} = ⌊ α C_{1} ⌋

. A 1 × 1 pointwise convolution first projects the input features into a 2C′-channel space, which is then processed by a depthwise 3 × 3 convolution and evenly split into two branches (U₁, U₂). Each branch applies an independent depthwise convolution followed by a hyperbolic tangent activation to perform nonlinear intensity modulation, and a residual connection is introduced by directly adding the activated response back to the original branch feature. This design enhances local feature responses while preserving the original semantic distribution and stabilizing gradient propagation. The two enhanced branches are subsequently fused through element-wise multiplication, forming an implicit gating mechanism that selectively amplifies salient responses only when both branches are jointly activated. Finally, a 1 × 1 convolution projects the fused features back to the original channel dimension. All convolutions in IEFU are implemented without bias terms, and the same hyperparameter configuration is shared across different network stages to ensure consistent behavior and computational efficiency.

Based on the proposed ACFM and EFEC3 modules, we redesign the feature fusion network of RT-DETR and propose a novel Adaptive Enhancement Feature Pyramid Network (AEFPN). The reconstructed architecture incorporates multi-level dynamic weighting via ACFM together with localized feature enhancement provided by EFEC3, enabling bidirectional compensation of cross-scale features in terms of both semantic abstraction and fine-grained detail. In particular, the ACFM performs adaptive weight allocation across features of different resolutions, ensuring a balanced contribution between shallow-layer texture cues and high-level semantic information. Meanwhile, the EFEC3 module employs a dual-path structure that integrates “main-branch enhancement” with “bypass compensation,” achieving nonlinear feature strengthening without incurring additional parameter overhead. The stacked IEFUs establish a progressive feature amplification mechanism, endowing deeper layers with improved sensitivity to subtle structural variations and enhanced noise suppression. In practical defect detection scenarios, this design effectively mitigates the common issue in traditional fusion networks where excessive dominance of high-level semantics leads to the attenuation of low-level texture information. Through this progressive integration strategy—characterized by dynamic fusion, localized enhancement, and global coordination—the restructured feature fusion network alleviates shallow-layer semantic weakening and channel imbalance inherent in the original design, while maintaining computational efficiency and substantially improving PCB defect detection performance.

4. Experiments and Result Analysis

4.1. Experimental Environment

We perform all experiments on Ubuntu 22.04 with an AMD EPYC 7K62 48-core CPU and an NVIDIA GeForce RTX 3090 GPU (24 GB memory). The software stack includes Python 3.10.14, PyTorch 2.2.2, Torchvision 0.17.2, and CUDA 12.1. For model training, we resize input images to 640 × 640 pixels and use a batch size of 4. The training process runs for 250 epochs with an initial learning rate of 0.0001, which helps achieve stable convergence during early iterations. We adopt the default hyperparameter settings from the official RT-DETR implementation for all other configurations. Table 1 summarizes the key training parameters. To maintain experimental consistency, we apply identical hardware specifications, software versions, and training protocols across all ablation studies and comparative evaluations. During training, standard data augmentation strategies provided by the official RT-DETR/Ultralytics implementation are adopted to improve model generalization and robustness. Specifically, random horizontal flipping and random scaling are applied to the input images, while geometric transformations such as rotation and perspective distortion are disabled to preserve the structural integrity of PCB patterns. Color-based augmentations, including random adjustments of brightness, contrast, and saturation, are applied with moderate intensity. Mosaic and MixUp augmentations are not enabled, considering that PCB defect images exhibit strict spatial correspondence and local structural characteristics. All data augmentation settings follow the default configuration of the RT-DETR training pipeline and are consistently applied across all comparative experiments and ablation studies to ensure fair evaluation and reproducibility.

4.2. Dataset and Evaluation Metrics

We adopt the PKU-Market-PCB dataset, which was constructed by Peking University and comprises 1386 annotated images covering six types of PCB surface defects. Figure 2B illustrates representative samples from each category. As can be visually observed, PCB defects typically appear at small spatial scales. The detailed statistics of the PKU-Market-PCB dataset are summarized in Table 2. For all experiments, the dataset is partitioned into training, validation, and test sets with a ratio of 8:1:1 to support model training and performance evaluation.

We assess the proposed MEE-DETR on the PCB defect detection task using standard object-detection metrics: Precision, Recall, AP, mAP, FLOPs and model size. These indicators provide a comprehensive view of both detection accuracy and model efficiency.

Precision is defined as the proportion of predicted positive instances that are actually positive, as formulated in Equation (24). Here, TP (True Positive) denotes the number of defect samples correctly identified by the model, whereas FP (False Positive) represents the number of non-defect samples that are incorrectly classified as defects.

P r e c i s i o n = \frac{T P}{T P + F P}

(24)

Recall measures the proportion of true defects successfully identified by the model relative to all ground-truth defects, as expressed in Equation (25). The term FN (False Negative) represents defect instances that are missed during detection.

R e c a l l = \frac{T P}{T P + F N}

(25)

Average Precision (AP) is a key metric for evaluating the detection performance of a single category in object detection tasks. It is defined as the area under the Precision–Recall (PR) curve, where the PR curve is constructed from multiple pairs of precision and recall values computed at different confidence thresholds. A higher AP indicates stronger recognition capability for the corresponding class. The metric is formally computed as follows:

A P = \int_{0}^{1} P (R) d (R)

(26)

Mean Average Precision (mAP) is an essential metric for assessing the overall detection accuracy in multi-class object detection tasks. It is typically computed by averaging the AP values of all categories under different Intersection over Union (IoU) thresholds, as expressed in Equation (27). Here, N denotes the total number of classes in the dataset.

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(27)

4.3. Ablation Studies

To evaluate the contribution of each proposed component to the overall detection performance, a series of ablation experiments was conducted. The results obtained by progressively integrating the improved modules are summarized in Table 3. Model A denotes the RT-DETR baseline; Model B replaces the baseline backbone with ESBN; Model C replaces AIFI with ETB; Model D replaces the original neck with AEFPN; Model E integrates ESBN and ETB; Model F integrates ESBN and AEFPN; Model G integrates ETB and AEFPN; and Model H represents the final model that incorporates all proposed improvements into the baseline.

The experimental results show that each improved module yields a clear performance gain when integrated individually into the baseline model, and that incorporating them in different combinations also leads to additional improvements. Replacing the original backbone of the baseline model with ESBN results in increases in Precision, Recall, and mAP, while significantly reducing the number of parameters. This demonstrates that the enhanced backbone strengthens the representation of edge and structural features while maintaining a lightweight architecture. Substituting AIFI with ETB further improves all evaluation metrics, indicating that ETB effectively enhances the model’s capability in target representation. Replacing the original neck feature fusion network with AEFPN yields the most substantial single-module performance gain across all metrics, validating the effectiveness of its cross-scale fusion strategy. When different improved modules are combined pairwise, the model achieves additional gains, and the improvements exceed those obtained by each module individually, demonstrating the absence of conflicts among modules and strong synergistic interactions. Finally, when all three improvements are incorporated simultaneously, the model achieves a Precision of 99.1% (2.5% higher than RT-DETR), a Recall of 98.9% (9.4% higher than RT-DETR), mAP₅₀ and mAP_50–95 scores of 98.6% and 57.9%, corresponding to improvements of 2.3% and 4.2%, respectively, over RT-DETR. Compared to RT-DETR, our model achieves a 40.7% reduction in parameters while maintaining superior overall performance with lower computational overhead. These results clearly indicate that the proposed improvements function cooperatively within a unified framework and collectively enable substantial performance breakthroughs in PCB defect inspection.

4.4. Comparative Experiments

4.4.1. Comparison of Different Backbone Networks

Multiple comparative studies were performed to evaluate the effect of each design refinement incorporated into MEE-DETR. First, to assess the capability of the proposed ESBN in modeling edge features and representing small defect instances, we designed a set of backbone replacement experiments based on the RT-DETR baseline framework. In these experiments, only the backbone network was substituted, while all other architectural components and training settings were kept identical to ensure a fair comparison. Table 4 reports the outcomes produced by RT-DETR when employing various backbone architectures. The results show that ESBN consistently outperforms the original ResNet18 backbone used in RT-DETR across multiple key metrics. Specifically, ESBN yields a 1.3% gain in Precision and a 4.7% improvement in Recall, indicating its superior capability in identifying small-sized defects and complex boundary structures. Moreover, replacing the backbone with ESBN reduces the model size to only 9.71 M parameters—a 51.1% decrease compared with ResNet18—further confirming that ESBN enhances representational capacity while maintaining a lightweight architecture. Overall, ESBN serves as an effective backbone for MEE-DETR, substantially improving the model’s detection performance and efficiency when dealing with small and subtle PCB surface defects under complex backgrounds.

4.4.2. Comparison of Different Attention Mechanisms

To assess the contribution of the ETB module to PCB defect detection, a comparison was conducted against multiple commonly adopted attention mechanisms, including HiLo Attention, Dynamic-range Histogram Self-Attention (DHSA), Cascaded Group Attention (CGA), DAttention, and PolaAttention. All methods were trained and tested under identical experimental settings to ensure a fair comparison. The quantitative results are reported in Table 5. ETB achieves the best performance across Precision, Recall, and mAP. Specifically, compared to the original AIFI module, ETB improves precision by 1.5%, recall by 4.2%, and mAP by 1.5%. Compared to other attention mechanisms, ETB has superior robustness for complex backgrounds and small-size defects, since its joint frequency–spatial modeling allows the FSA branch to model cross-scale spectral energy dependencies, and the SSA branch improves local detail discrimination. Moreover, EFFN increases the coherence and discriminability of the fused representations through frequency–spatial coupling. Overall, the introduction of ETB enables a more comprehensive extraction of multi-domain features—spanning energy distribution, texture disturbances, and boundary details—within PCB defect regions, thereby delivering consistently strong detection performance.

4.4.3. Comparison of Different Neck Networks

To evaluate the effectiveness of the proposed AEFPN, we conducted a series of neck-network replacement experiments based on the RT-DETR framework, and the results are summarized in Table 6. Under identical training configurations, AEFPN achieves the best performance in terms of Precision, Recall, and mAP, with particularly notable improvements over the original FPN structure. Specifically, AEFPN increases Precision by 1.7%, Recall by 6.6%, and mAP by 1.7%. In scenarios involving small-sized and low-texture PCB defects, AEFPN demonstrates significantly higher recall and localization accuracy, confirming its advantages in fine-grained feature modeling and multi-scale semantic coordination. These performance gains stem from the ACFM, whose cross-scale adaptive weighting mechanism dynamically balances the contributions of hierarchical features, as well as the EFEC3 module, whose local intensity-enhancement design strengthens detail, edge, and texture cues, thereby improving the model’s discriminability and robustness.

4.4.4. Comparison of Different Detection Models

To verify the overall effectiveness of the proposed improvements in MEE-DETR, we conducted comparative experiments against the baseline RT-DETR, several mainstream detectors (YOLOv5, YOLOv8, YOLOv9, YOLO11), and a number of recently proposed enhanced models. All models were evaluated under identical experimental settings and parameter configurations, and the quantitative results are summarized in Table 7, while additional visual comparisons in terms of the Precision–Epoch curves on the PKU-Market-PCB dataset and the corresponding confusion matrices are provided in Figure 3 and Figure 4, respectively. The experiments demonstrate that MEE-DETR surpasses all comparison models across multiple evaluation metrics. In particular, compared with the YOLO family and its improved variants, MEE-DETR consistently exhibits superior detection performance. Relative to the baseline RT-DETR, MEE-DETR achieves a 2.3% improvement in mAP₅₀, a 4.2% improvement in mAP_50–95, and a substantial 9.4% increase in Recall. For PCB defect detection, reducing missed detections is especially critical, as undetected defects may lead to latent quality risks or even cause an entire PCB to be scrapped. MEE-DETR significantly lowers the missed-detection rate while maintaining high detection precision, reflecting its strong robustness and reliability. Furthermore, compared with the other three improved RT-DETR variants, MEE-DETR also demonstrates clear overall advantages, confirming its superior accuracy–efficiency trade-off and its suitability for high-precision PCB defect inspection.

4.5. Visualization

As illustrated in Figure 5, we present a comprehensive visual comparison among different detection models on representative PCB defect categories, including short, spur, and spurious copper. For each example, the first three columns present the detection results produced by different models, illustrating their performance in identifying PCB defects, while the last three columns visualize the corresponding attention or activation heatmaps, highlighting how different models focus on defect regions. The first row shows the original images annotated with ground-truth defect labels, which serve as a reference for both detection accuracy and attention localization. From the detection results in the first three columns, it can be observed that several comparison models still suffer from missed detections or false positives, particularly for small or low-contrast defects. In contrast, MEE-DETR consistently detects the defect regions more completely and accurately, exhibiting fewer missed and false detections. Moreover, the predicted bounding boxes and class labels generated by MEE-DETR are associated with higher confidence scores, indicating more reliable and stable detection performance. The last three columns of Figure 5 further visualize the corresponding attention or activation heatmaps of different models. Compared with other methods, the heatmaps produced by MEE-DETR demonstrate a more concentrated and precise response on the true defect regions, while suppressing irrelevant background areas. This indicates that MEE-DETR is able to capture more discriminative features and achieve more effective defect localization, especially under complex backgrounds and subtle texture variations.

5. Conclusions

In this paper, we introduce MEE-DETR, a refined variant of RT-DETR tailored for PCB defect detection. The proposed method incorporates comprehensive improvements across three dimensions: backbone feature extraction, dual-domain feature interaction, and multi-scale fusion. Specifically, the ESBN module constructs a multi-scale edge extraction and semantic fusion pathway, significantly strengthening the representation of defect-edge structures within shallow features. The ETB integrates frequency-domain self-attention, spatial self-attention, and a frequency–spatial entanglement feed-forward network, enabling deep cross-domain interaction and unified feature modeling. Furthermore, the AEFPN combines a cross-scale adaptive weighting mechanism with a local nonlinear enhancement structure, substantially improving detail preservation and semantic balance during feature fusion. Experimental results demonstrate that MEE-DETR achieves strong detection performance with a streamlined model architecture, outperforming the baseline while maintaining computational efficiency. From an industrial perspective, although the current evaluation is conducted on publicly available datasets, the PKU-Market-PCB dataset is collected from real PCB production lines and reflects practical industrial inspection scenarios. This indicates that the proposed MEE-DETR has strong potential for deployment in real-world manufacturing environments. In future work, we plan to further validate the proposed method through collaborations with PCB manufacturers and industrial partners, enabling on-site evaluation under real production conditions. Such industrial-level validation will be essential for assessing robustness, efficiency, and long-term reliability, thereby facilitating the practical adoption of advanced PCB defect detection systems.

Author Contributions

Conceptualization, X.M. and X.X.; methodology, X.M.; software, X.M.; validation, X.M., X.X. and Y.S.; formal analysis, Y.S.; investigation, X.X.; resources, X.M.; data curation, X.X.; writing—original draft preparation, X.M.; writing—review and editing, X.M.; visualization, X.M.; supervision, X.X.; project administration, Y.S.; funding acquisition, X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guangxi Key Technologies R&D Program, grant numbers GuikeAB23026036 and GuikeAB23026004; and by the National Natural Science Foundation of China, grant number 62262011. The APC was funded by these projects.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We sincerely thank the editor and reviewers for their insightful and constructive comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ural, D.I.; Sezen, A. Research on PCB defect detection using artificial intelligence: A systematic mapping study. Evol. Intell. 2024, 17, 3101–3111. [Google Scholar] [CrossRef]
Ling, Q.; Isa, N.A.M. Printed Circuit Board Defect Detection Methods Based on Image Processing, Machine Learning and Deep Learning: A Survey. IEEE Access 2023, 11, 15921–15944. [Google Scholar] [CrossRef]
Avitabile, G.; Florio, A.; Gallo, V.L.; Pali, A.; Forni, L. An Optimization Framework for the Design of High-Speed PCB VIAs. Electronics 2022, 11, 475. [Google Scholar] [CrossRef]
Xu, W.-J.; Xin, D.-J.; Yang, L.; Zhou, Y.-K.; Wang, D.; Li, W.-X. High-Speed Signal Optimization at Differential VIAs in Multilayer Printed Circuit Boards. Electronics 2024, 13, 3377. [Google Scholar] [CrossRef]
Li, C.; Xue, C.; Jia, G.; Zhang, H.; Jiang, G. Research on improvement of defect detection system for printed circuit board based on deep learning. J. Comput. Methods Sci. Eng. 2025, 25, 2253–2262. [Google Scholar] [CrossRef]
Singh, K.; Kharche, S.; Chauhan, A.; Salvi, P. PCB Defect Detection Methods: A Review of Existing Methods and Potential Enhancements. J. Eng. Sci. Technol. Rev. 2024, 17, 156–167. [Google Scholar] [CrossRef]
Saberironaghi, A.; Ren, J.; El-Gindy, M. Defect Detection Methods for Industrial Products Using Deep Learning Techniques: A Review. Algorithms 2023, 16, 95. [Google Scholar] [CrossRef]
Thakfan, A.; Bin Salamah, Y. Artificial-Intelligence-Based Detection of Defects and Faults in Photovoltaic Systems: A Survey. Energies 2024, 17, 4807. [Google Scholar] [CrossRef]
Eisentraut, L.; Hosch, J.; Roytenberg, M.; Benecke, A.; Penava, P.; Buettner, R. Defect Detection in Industrial Soldering Processes Using Machine Learning: A Critical Literature Review. IEEE Access 2025, 13, 41533–41558. [Google Scholar] [CrossRef]
Chen, M.Q.; Yu, L.J.; Zhi, C.; Sun, R.J.; Zhu, S.W.; Gao, Z.Y.; Ke, Z.X.; Zhu, M.Q.; Zhang, Y.M. Improved faster R-CNN for fabric defect detection based on Gabor filter with Genetic Algorithm optimization. Comput. Ind. 2022, 134, 103551. [Google Scholar] [CrossRef]
Qiao, S.; Chen, L.-C.; Yuille, A. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
Chen, X.; Wu, Y.L.; He, X.Y.; Ming, W.Y. A Comprehensive Review of Deep Learning-Based PCB Defect Detection. IEEE Access 2023, 11, 139017–139038. [Google Scholar] [CrossRef]
Hussain, M. YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
Guo, C.S.; Cai, M.; Ying, N.; Chen, H.H.; Zhang, J.W.; Zhou, D. ANMS: Attention-based non-maximum suppression. Multimed. Tools Appl. 2022, 81, 11205–11219. [Google Scholar] [CrossRef]
Zhou, Y.B.; Yuan, M.H.; Zhang, J.; Ding, G.F.; Qin, S.F. Review of vision-based defect detection research and its perspectives for printed circuit board. J. Manuf. Syst. 2023, 70, 557–578. [Google Scholar] [CrossRef]
Shah, S.; Tembhurne, J. Object detection using convolutional neural networks and transformer-based models: A review. J. Electr. Syst. Inf. Technol. 2023, 10, 54. [Google Scholar] [CrossRef]
Huang, W.; Wei, P.; Zhang, M.; Liu, H. HRIPCB: A challenging dataset for PCB defects detection and classification. J. Eng. 2020, 13, 303–309. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M.; Hill, R.; Allen, P. A Comprehensive Review of Convolutional Neural Networks for Defect Detection in Industrial Applications. IEEE Access 2024, 12, 94250–94295. [Google Scholar] [CrossRef]
Dai, Q.; Xiao, Y.; Lv, S.; Song, S.; Xue, X.; Liang, S.; Huang, Y.; Li, Z. YOLOv8-GABNet: An Enhanced Lightweight Network for the High-Precision Recognition of Citrus Diseases and Nutrient Deficiencies. Agriculture 2024, 14, 1964. [Google Scholar] [CrossRef]
Zhou, G.A.; Yu, L.J.; Su, Y.X.; Xu, B.R.; Zhou, G.Y. Lightweight PCB defect detection algorithm based on MSD-YOLO. Clust. Comput.-J. Netw. Softw. Tools Appl. 2024, 27, 3559–3573. [Google Scholar] [CrossRef]
Tang, J.; Liu, S.; Zhao, D.; Tang, L.; Zou, W.; Zheng, B. PCB-YOLO: An Improved Detection Algorithm of PCB Surface Defects Based on YOLOv5. Sustainability 2023, 15, 5963. [Google Scholar] [CrossRef]
Li, Z.; Li, A.; Li, W.; Kong, X.; Zhang, Y. HSD-YOLO: A Lightweight and Accurate Method for PCB Defect Detection. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024. [Google Scholar]
Zhang, L.X.; Chen, J.S.; Chen, J.G.; Wen, Z.C.; Zhou, X.S. LDD-Net: Lightweight printed circuit board defect detection network fusing multi-scale features. Eng. Appl. Artif. Intell. 2024, 129, 107628. [Google Scholar] [CrossRef]
Cumbajin, E.; Rodrigues, N.; Costa, P.; Miragaia, R.; Frazão, L.; Costa, N.; Fernández-Caballero, A.; Carneiro, J.; Buruberri, L.H.; Pereira, A. A Systematic Review on Deep Learning with CNNs Applied to Surface Defect Detection. Imaging 2023, 9, 193. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020. [Google Scholar]
Shehzadi, T.; Hashmi, K.A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. Object Detection with Transformers: A Review. Sensors 2025, 25, 6025. [Google Scholar] [CrossRef]
Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2020, arXiv:2010.04159. [Google Scholar] [CrossRef]
Yao, Z.; Ai, J.; Li, B.; Zhang, C. Efficient DETR: Improving End-to-End Object Detector with Dense Prior. arXiv 2021, arXiv:2104.01318. [Google Scholar] [CrossRef]
Meng, D.; Chen, X.; Fan, Z.; Zeng, G.; Li, H.; Yuan, Y.; Sun, L.; Wang, J. Conditional detr for fast training convergence. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024. [Google Scholar]
Ji, L.; Huang, C.H.; Li, H.W.; Han, W.J.; Yi, L.Y. MS-DETR: A real-time multi-scale detection transformer for PCB defect detection. Signal Image Video Process. 2025, 19, 203. [Google Scholar] [CrossRef]
Liu, M.; Wang, H.; Du, L.; Ji, F.; Zhang, M. Bearing-DETR: A Lightweight Deep Learning Model for Bearing Defect Detection Based on RT-DETR. Sensors 2024, 24, 4262. [Google Scholar] [CrossRef] [PubMed]
Sun, Y.; Xu, C.; Yang, J.; Xuan, H.; Luo, L. Frequency-Spatial Entanglement Learning for Camouflaged Object Detection. In Proceedings of the Computer Vision—ECCV 2024, Milan, Italy, 29 September–4 October 2024. [Google Scholar]
Chen, J.; Kao, S.-h.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Zhang, Y.; Ding, X.; Yue, X. Scaling up your kernels: Large kernel design in convnets towards universal representations. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 11692–11707. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Lin, Z.; Han, J.; Ding, G. Repvit: Revisiting mobile cnn from vit perspective. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
Li, Y.X.; Li, X.; Dai, Y.M.; Hou, Q.B.; Liu, L.; Liu, Y.X.; Cheng, M.M.; Yang, J. LSKNet: A Foundation Lightweight Backbone for Remote Sensing. Int. J. Comput. Vis. 2025, 133, 1410–1431. [Google Scholar] [CrossRef]
Pan, Z.; Cai, J.; Zhuang, B. Fast vision transformers with hilo attention. Adv. Neural Inf. Process. Syst. 2022, 35, 14541–14554. [Google Scholar]
Sun, S.; Ren, W.; Gao, X.; Wang, R.; Cao, X. Restoring Images in Adverse Weather Conditions via Histogram Transformer. In Proceedings of the Computer Vision—ECCV 2024, Milan, Italy, 29 September–4 October 2024. [Google Scholar]
Liu, X.; Peng, H.; Zheng, N.; Yang, Y.; Hu, H.; Yuan, Y. Efficientvit: Memory efficient vision transformer with cascaded group attention. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Xia, Z.; Pan, X.; Song, S.; Li, L.E.; Huang, G. Vision transformer with deformable attention. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Meng, W.; Luo, Y.; Li, X.; Jiang, D.; Zhang, Z. PolaFormer: Polarity-aware Linear Attention for Vision Transformers. In Proceedings of the Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
Li, H.L.; Li, J.; Wei, H.B.; Liu, Z.; Zhan, Z.F.; Ren, Q.L. Slim-neck by GSConv: A lightweight-design for real-time detector architectures. J. Real-Time Image Process. 2024, 21, 62. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Tang, F.; Xu, Z.; Huang, Q.; Wang, J.; Hou, X.; Su, J.; Liu, J. DuAT: Dual-aggregation transformer network for medical image segmentation. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Xiamen, China, 13–15 October 2023. [Google Scholar]
Chen, Y.F.; Zhang, C.Y.; Chen, B.; Huang, Y.Y.; Sun, Y.F.; Wang, C.M.; Fu, X.J.; Dai, Y.X.; Qin, F.W.; Peng, Y.; et al. Accurate leukocyte detection based on deformable-DETR and multi-level feature fusion for aiding diagnosis of blood diseases. Comput. Biol. Med. 2024, 170, 107917. [Google Scholar] [CrossRef]
Wang, J.; Xie, X.; Liu, G.; Wu, L. A Lightweight PCB Defect Detection Algorithm Based on Improved YOLOv8-PCB. Symmetry 2025, 17, 309. [Google Scholar] [CrossRef]
Xiao, G.S.; Hou, S.L.; Zhou, H.Y. PCB defect detection algorithm based on CDI-YOLO. Sci. Rep. 2024, 14, 7351. [Google Scholar] [CrossRef]
Yuan, M.H.; Zhou, Y.B.; Ren, X.Y.; Zhi, H.; Zhang, J.; Chen, H.J. YOLO-HMC: An Improved Method for PCB Surface Defect Detection. IEEE Trans. Instrum. Meas. 2024, 73, 2001611. [Google Scholar] [CrossRef]
Xia, K.W.; Lv, Z.L.; Liu, K.; Lu, Z.Y.; Zhou, C.D.; Zhu, H.; Chen, X.L. Global contextual attention augmented YOLO with ConvMixer prediction heads for PCB surface defect detection. Sci. Rep. 2023, 13, 9805. [Google Scholar] [CrossRef]
Wang, S.; Jiang, H.; Li, Z.; Yang, J.; Ma, X.; Chen, J.; Tang, X. PHSI-RTDETR: A Lightweight Infrared Small Target Detection Algorithm Based on UAV Aerial Photography. Drones 2024, 8, 240. [Google Scholar] [CrossRef]
Chi, J.; Zhang, M.K.; Zhang, P.H.; Niu, G.W.; Zheng, Z.H. An improved EAE-DETR model for defect detection of server motherboard. Sci. Rep. 2025, 15, 29063. [Google Scholar] [CrossRef] [PubMed]
Xie, Z.; Zou, X. MFAD-RTDETR: A Multi-Frequency Aggregate Diffusion Feature Flow Composite Model for Printed Circuit Board Defect Detection. Electronics 2024, 13, 3557. [Google Scholar] [CrossRef]

Figure 1. Architecture of the proposed MEE-DETR. (A) Overall architecture. (B) Architecture of MSEEM. (C) Architecture of ESFN. (D) Architecture of ACFM. (E) Architecture of EFEC3. (F) Architecture of ETB. (G) Architecture of SEEU. (H) Architecture of IEFU. (I) Architecture of SSA. (J) Architecture of FSA. (K) Architecture of EFFN.

Figure 2. Sample of PCB images. (A) A full PCB image. (B) Six defect patch images.

Figure 3. Comparison of precision–epoch curves of different models evaluated on PKU-Market-PCB.

Figure 4. Comparison of confusion matrices of different models evaluated on PKU-Market-PCB. (A) YOLOv5m. (B) YOLOv8m. (C) YOLOv9m. (D) YOLOv11. (E) YOLOv8-PCB. (F) CDI-YOLO. (G) YOLO-HMC. (H) GCC-YOLO. (I) RT-DETR. (J) PHSI-RTDETR. (K) EAE-DETR. (L) MFAD-RTDETR. (M) MEE-DETR.

Figure 5. Comparison of detection results and corresponding attention heatmaps produced by different models on the PKU-Market-PCB dataset.

Table 1. Training configuration for MEE-DETR.

Parameters	Value
Batch Size	4
Epoch	250
Image Size	640 × 640
Lr	0.0001
Momentum	0.8
Weight decay	0.0001
Optimizer	AdamW

Table 2. The number of images and the distribution of defect instances across the training, validation, and test sets.

Defect Type	Number of Images	Number of Defects	Training Set	Validation Set	Testing Set
Open circuit	232	964	769	95	100
Short	232	982	787	98	97
Missing hole	230	994	798	99	97
Mouse bite	230	984	787	98	99
Spur	230	976	779	97	100
Spurious copper	232	1006	805	99	102
Sum	1386	5906	4725	586	595

Table 3. Ablation study results.

Model	ESBN	ETB	AEFPN	Precision (%)	Recall (%)	mAP₅₀ (%)	mAP_50–95 (%)	Parameters (M)
Model A				96.6	89.5	96.3	53.7	19.87
Model B	✔			97.9	94.2	97.5	53.9	9.71
Model C		✔		98.1	93.7	97.8	54.7	20.12
Model D			✔	98.3	96.1	98.0	56.2	10.03
Model E	✔	✔		98.6	95.8	98.2	54.9	9.95
Model F	✔		✔	98.4	96.5	98.1	55.1	11.63
Model G		✔	✔	98.8	97.5	98.4	56.8	10.17
Model H	✔	✔	✔	99.1	98.9	98.6	57.9	11.77

Table 4. Comparative results of RT-DETR using different backbone networks.

Backbone	Precision (%)	Recall (%)	mAP (%)	Parameters (M)	FLOPs (G)
ResNet18	96.6	89.5	96.3	19.87	57.0
FasterNet [35]	97.9	90.1	97.3	10.91	28.8
UniRepLKNet [36]	97.5	89.3	96.7	12.83	33.7
RepViT [37]	97.3	92.3	97.1	13.47	36.7
SwinTransformer [38]	97.2	93.1	96.3	36.42	97.3
LSKNet [39]	96.9	89.9	95.8	12.66	37.9
ESBN	97.9	94.2	97.5	9.71	50.7

Table 5. Comparative results of RT-DETR using different attention mechanisms.

Attention	Precision (%)	Recall (%)	mAP₅₀ (%)
AIFI	96.6	89.5	96.3
HiLo Attention [40]	97.8	93.4	96.9
DHSA [41]	97.3	91.8	97.1
CGA [42]	97.6	94.1	97.6
DAttention [43]	96.8	92.7	96.3
PolaAttention [44]	97.9	92.5	97.5
ETB	98.1	93.7	97.8

Table 6. Comparative results of RT-DETR using different neck feature-fusion networks.

Neck	Precision (%)	Recall (%)	mAP₅₀ (%)
Original	96.6	89.5	96.3
SlimNeck [45]	92.1	89.4	91.2
BiFPN [46]	93.4	90.5	92.3
GLSA [47]	93.7	91.2	92.8
HS-FPN [48]	94.0	91.8	93.1
AEFPN	98.3	96.1	98.0

Table 7. Results of comparisons with other models.

Model	Precision (%)	Recall (%)	mAP₅₀ (%)	mAP_50–95 (%)	Parameters (M)	FLOPs (G)	FPS (F/S)	Weight Size (MB)
YOLOv5m	94.1	91.8	92.7	49.7	20.89	49.8	61.4	40.7
YOLOv8m	94.7	92.2	93.9	50.3	25.84	70.3	62.2	49.5
YOLOv9m	95.7	93.7	96.2	52.9	20.01	76.7	36.8	39.1
YOLO11	96.4	92.8	95.7	53.5	20.33	67.6	44.4	38.8
YOLOv8-PCB [49]	94.7	94.0	96.1	50.7	2.46	7.1	89.4	5.2
CDI-YOLO [50]	97.1	96.4	96.8	51.1	5.76	12.6	128.0	5.6
YOLO-HMC [51]	97.9	93.1	97.7	54.2	5.94	17.8	65.8	37.5
GCC-YOLO [52]	96.8	92.7	96.4	51.7	8.24	28.1	62.6	14.5
RT-DETR	96.6	89.5	96.3	53.7	19.87	57.0	75.9	77.0
PHSI-RTDETR [53]	97.3	93.9	97.1	54.8	13.77	47.0	30.4	63.7
EAE-DETR [54]	96.7	92.6	96.1	54.7	15.56	50.2	74	51.9
MFAD-RTDETR [55]	96.5	94.5	97.0	51.0	16.27	176.5	72.5	58.2
ours	99.1	98.9	98.6	57.9	11.77	61.5	76.3	43.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, X.; Xie, X.; Song, Y. MEE-DETR: Multi-Scale Edge-Aware Enhanced Transformer for PCB Defect Detection. Electronics 2026, 15, 504. https://doi.org/10.3390/electronics15030504

AMA Style

Ma X, Xie X, Song Y. MEE-DETR: Multi-Scale Edge-Aware Enhanced Transformer for PCB Defect Detection. Electronics. 2026; 15(3):504. https://doi.org/10.3390/electronics15030504

Chicago/Turabian Style

Ma, Xiaoyu, Xiaolan Xie, and Yuhui Song. 2026. "MEE-DETR: Multi-Scale Edge-Aware Enhanced Transformer for PCB Defect Detection" Electronics 15, no. 3: 504. https://doi.org/10.3390/electronics15030504

APA Style

Ma, X., Xie, X., & Song, Y. (2026). MEE-DETR: Multi-Scale Edge-Aware Enhanced Transformer for PCB Defect Detection. Electronics, 15(3), 504. https://doi.org/10.3390/electronics15030504

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MEE-DETR: Multi-Scale Edge-Aware Enhanced Transformer for PCB Defect Detection

Abstract

1. Introduction

2. Related Work

2.1. CNN-Based Defect Detection Methods

2.2. Transformer-Based Defect Detection Methods

3. Methodology

3.1. The Proposed MEE-DETR Model

3.2. ESBN

3.2.1. MSEEM

3.2.2. ESFM

3.3. ETB

3.4. AEFPN

3.4.1. ACFM

3.4.2. EFEC3

4. Experiments and Result Analysis

4.1. Experimental Environment

4.2. Dataset and Evaluation Metrics

4.3. Ablation Studies

4.4. Comparative Experiments

4.4.1. Comparison of Different Backbone Networks

4.4.2. Comparison of Different Attention Mechanisms

4.4.3. Comparison of Different Neck Networks

4.4.4. Comparison of Different Detection Models

4.5. Visualization

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI