EAS-DETR: An Enhanced Real-Time Transformer with Sparse Attention and Global Context for PCB Defect Inspection

Yan, Yuxin; Wu, Ruize; Ren, Jia

doi:10.3390/electronics15081662

Open AccessArticle

EAS-DETR: An Enhanced Real-Time Transformer with Sparse Attention and Global Context for PCB Defect Inspection

by

Yuxin Yan

,

Ruize Wu

and

Jia Ren

^*

School of Information Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(8), 1662; https://doi.org/10.3390/electronics15081662

Submission received: 11 March 2026 / Revised: 11 April 2026 / Accepted: 12 April 2026 / Published: 15 April 2026

Download

Browse Figures

Versions Notes

Abstract

Printed circuit board (PCB) defect inspection is critical for ensuring product reliability, yet it remains challenging due to the microscopic scale of defects and complex background patterns. To improve the localization of fine anomalies, this paper proposes EAS-DETR, an efficient and highly sensitive real-time end-to-end detector. First, we reconstruct the feature extraction backbone by introducing a novel C2f-EC module, which jointly models local textures and global structural dependencies. Second, an Adaptive Sparse Attention-based Intra-scale Feature Interaction (ASAFI) module is proposed to suppress background noise and focus the network’s attention on sparse defect regions. Finally, an optimized feature pyramid network, SGO-FPN, is designed to mitigate cross-scale feature misalignment and preserve high-resolution spatial details for small object localization. Experiments demonstrate that EAS-DETR achieves an mAP@0.5 of 93.0% and a 91.9% recall on a multi-source PCB dataset. The model outperforms mainstream YOLO variants and baseline RT-DETR models while maintaining a moderate parameter count of 14.6M and achieving a real-time inference speed of over 70 FPS. Furthermore, cross-domain validations on public benchmarks confirm its robust generalization capability for complex tiny object detection tasks.

Keywords:

PCB; defect inspection; RT-DETR; sparse attention; feature fusion; tiny object detection

1. Introduction

Driven by the ever-growing demand for miniaturization and performance in electronic devices, printed circuit board (PCB) designs have evolved toward multi-layer stacking and high-density interconnect (HDI) architectures with ultra-fine features [1,2]. As trace widths and spacing continue to shrink, the process windows for critical manufacturing steps narrow accordingly, even minor parameter deviations can induce microscopic defects [3]. Meanwhile, tightly packed traces and components tend to clutter the visual field, making it harder to spot subtle anomalies [4]. High-precision automated defect inspection at production endpoints has therefore become indispensable for ensuring product reliability.

Early PCB quality control relied on manual visual inspection, a process constrained by low throughput, operator fatigue, and subjective judgment variability [5]. Automated optical inspection (AOI) systems based on conventional machine vision were subsequently introduced, employing template matching and handcrafted feature extractors to reduce human involvement [6]. These approaches, however, generalize poorly under varying illumination conditions and struggle with the complex textures of modern PCB layouts, resulting in high false alarm rates [7].

Since the early 2010s, convolutional neural networks (CNNs) have reshaped industrial defect detection through deep representation learning. Two-stage detectors such as R-CNN [8], Fast R-CNN [9], and Faster R-CNN [10] first generate candidate regions and then classify each one, offering high localization accuracy at the cost of inference speed. In the PCB domain, Li et al. [11] trained a VGG16-based detector on augmented industrial data covering five defect categories but only reached an overall mean Average Precision (mAP) of 60%, with acceptable precision limited to certain defect types. To improve fine-grained defect localization, Hu et al. [12] incorporated ResNet50 with Feature Pyramid Networks and a guided anchor-based region proposal network into a modified Faster R-CNN framework, which improved the detection of small-scale PCB defects. Despite these gains, the region proposal stage introduces considerable latency that is incompatible with the throughput requirements of high-volume manufacturing lines.

One-stage detectors such as the You Only Look Once (YOLO) series [13,14,15,16] and SSD [17] bypass region proposals by directly regressing bounding boxes from feature maps, making them more suitable for real-time deployment. Recent iterations, particularly YOLOv8 and YOLO11, have gained traction in industrial settings for their favorable balance between speed and accuracy. Wang et al. [18] integrated a channel–spatial hybrid attention mechanism and a redesigned feature fusion module into YOLOv8, reporting 96.1% mAP with a model size of only 5.2 MB. Taking a different route, Liu et al. [19] replaced the YOLOv8 backbone with FasterNet and introduced a normalization-based attention module to reduce redundant computation, achieving 94.4% mAP on the HRIPCB dataset. Huang et al. [20] adopted a data-centric perspective, leveraging GAN-based augmentation to improve YOLO11 generalization for infrequent and complex defect categories. Although these methods demonstrate strong overall accuracy, the underlying convolutional architectures remain confined to local receptive fields. In modern PCB images, where small-scale defects such as copper residue or micro-scratches closely resemble the repetitive geometric patterns of surrounding traces, a purely local view is often insufficient to distinguish genuine anomalies from normal circuit structures. This ambiguity leads to both missed detections and false positives.

Transformer-based detectors offer a potential remedy. Detection Transformer (DETR) [21] introduced self-attention mechanisms that enable global feature interactions across all spatial locations, relaxing the receptive field constraints of CNNs. Subsequent work addressed DETR’s slow training convergence and quadratic computational complexity through various strategies: Deformable DETR [22] replaced global attention with deformable attention over sparse sampling points, DN-DETR [23] stabilized the bipartite matching process through an auxiliary denoising task, and DINO [24] further integrated contrastive denoising training with a mixed query selection strategy. These advances have improved both accuracy and training efficiency for end-to-end detectors. Nevertheless, the decoder overhead and the quadratic scaling of self-attention with image resolution remain prohibitive for deployment on industrial edge devices, where real-time throughput is a hard requirement.

Real-Time Detection Transformer (RT-DETR) [25] was designed to bridge this gap by introducing a hybrid encoder that efficiently processes multi-scale features and an Intersection over Union (IoU)-aware query selection mechanism, achieving competitive accuracy at much lower latency.

However, the direct application of its standard architecture to PCB defect inspection reveals notable limitations in feature representation. First, despite its substantial parameter count, the network struggles to detect micron-scale defects—such as spurs, shorts, and mouse bites—whose fine-grained spatial details are progressively attenuated through successive downsampling operations in the backbone. To mitigate this loss of discriminative information, recent studies have introduced various improved RT-DETR variants tailored for PCB inspection. Peng et al. [26] proposed MDD-DETR, which introduces a lightweight backbone with HiLo attention to jointly capture multi-frequency features and a dedicated neck structure to enhance small-target-rich feature fusion. Its multi-scale fusion, however, still relies on interpolation-based upsampling to align features across pyramid levels, inevitably introducing spatial misalignment when reconstructing micron-scale defect boundaries. Targeting computational efficiency, Ji et al. [27] presented MS-DETR, employing a multi-stage convolution module and a slim-scale adaptive fusion architecture to improve small-object feature extraction. Yet, the fusion module primarily performs channel-wise recalibration through squeeze-and-excitation operations without explicitly modeling spatial correspondence across resolutions, leaving cross-scale geometric misalignment for compact defects largely unresolved. Madan et al. [28] took a different approach by replacing standard feed-forward components with Hebbian and randomized layers and incorporating fuzzy attention-based multi-scale fusion to refine small object representations. Effective as it is at enriching multi-resolution feature interaction, the fuzzy attention operates on globally pooled representations without pixel-level spatial alignment, compromising positional precision for micron-scale defects.

Second, the periodically repetitive patterns prevalent in PCB images, including dense copper traces, regular pad arrays, and standardized via layouts, significantly distract the standard self-attention mechanism. Although the previously discussed models enhance feature extraction, they fail to overcome this distraction because their attention mechanisms remain predominantly dense and global. Consequently, they lack the sparse adaptability required to effectively filter out high-density background interference. This frequently leads to ambiguous attention allocation, causing the model to either misinterpret normal textural variations as defects or suppress genuine defect responses amid strong background pattern activations, resulting in a persistent coexistence of false positives and missed detections.

To tackle the aforementioned challenges in high-precision PCB inspection, this paper proposes EAS-DETR, an efficient and highly sensitive end-to-end defect detector based on the RTDETR-r18 architecture. The main contributions of this work are summarized as follows:

1.: We reconstruct the feature extraction backbone by developing the C2f-EC module. By embedding efficient local–global context aggregation and convolutional gating mechanisms into a Cross Stage Partial (CSP) topology, the network is empowered to jointly capture local textures and global structural dependencies while maintaining efficient gradient flow.
2.: We propose the Adaptive Sparse Attention-based Intra-scale Feature Interaction (ASAFI) module for the hybrid encoder. By adaptively fusing dense and sparse self-attention outputs, ASAFI effectively suppresses irrelevant background noise and actively redirects the network’s focus toward structurally sparse and minuscule defect regions.
3.: We design SGO-FPN, an advanced feature pyramid network tailored for small object localization. Incorporating soft interpolation strategies and space-to-depth transformations, SGO-FPN mitigates cross-scale spatial misalignment and leverages high-resolution, early-stage features to preserve crucial fine-grained details.

2. Method

2.1. RT-DETR Baseline Framework

RT-DETR is the first real-time detector based on the DETR framework. YOLO-series detectors generate a large number of candidate boxes and rely on non-maximum suppression (NMS) to remove duplicate predictions, where both the confidence threshold and the IoU threshold need to be manually tuned and the processing time varies with the number of objects in the image. RT-DETR takes a different approach: it treats detection as a set prediction problem. During training, Hungarian matching loss pairs each learnable query with at most one ground-truth object, so the network naturally learns to output one prediction per object. During inference, only a score threshold is needed to filter out low-confidence predictions, and no NMS is involved. This makes the inference time stable and predictable regardless of scene content.

RT-DETR consists of three parts: a backbone, a hybrid encoder, and a Transformer decoder. The backbone uses a ResNet network to extract feature maps at three different resolutions, which are then unified to the same number of channels before entering the encoder. In earlier DETR models, the multi-scale Transformer encoder was computationally expensive yet brought limited accuracy improvement. RT-DETR addresses this by splitting the encoder into two lightweight modules. The first module, Attention-based Intra-scale Feature Interaction (AIFI), performs self-attention only on the lowest-resolution feature map, which contains the richest semantic information while having the fewest tokens, making the computation very efficient. The second module, Cross-scale Feature Fusion Module (CCFM), merges features across the three scales using reparameterizable convolutions, which can be structurally reparameterized into standard convolutions during inference for faster execution. Together, these two modules achieve accuracy close to that of a full multi-scale encoder at a fraction of the computational cost.

Before the decoder starts working, a set of initial queries must be selected from the encoder output. Previous methods select queries based only on classification confidence, which may pick features that are semantically strong but poorly localized. RT-DETR proposes an uncertainty-minimal query selection that jointly considers both classification and localization scores, so that the selected queries are not only confident about object categories but also accurately positioned. The decoder then takes these queries and refines them through multiple stacked layers. Each layer applies self-attention among queries, deformable cross-attention to the encoder features, and a feed-forward network, and each layer has its own prediction head for bounding boxes and class labels. Since the layers work in a progressive manner, reducing the number of active layers at inference can speed up the detector with only a minor accuracy drop, and this adjustment does not require retraining.

On the COCOval2017 benchmark, RT-DETR with a ResNet-50 backbone achieves higher accuracy and faster speed than previously advanced YOLO detectors, including their L and X models. For PCB defect inspection, the production line requires stable and fast inference, and the defects to be detected vary in size and morphology. We therefore choose the most lightweight variant, RTDETR-r18, as our baseline. ResNet-18 is built from BasicBlock residual units and has the lowest computational cost among the ResNet family, while still providing multi-scale features of sufficient quality for the hybrid encoder and Transformer decoder. The network structure of this baseline is shown in Figure 1. To clearly illustrate how these multi-scale features are integrated throughout the network, the circled “c” symbol is uniformly used in this and subsequent architectural diagrams to denote the channel-wise concatenation operation.

2.2. Overview of the Proposed EAS-DETR Algorithm

In this paper, we propose EAS-DETR, an enhanced variant of the RTDETR-r18 architecture, specifically designed for high-precision PCB defect inspection. In addition to the severe visual interference from complex circuit patterns, the microscopic scale of defects presents a significant challenge for effective feature extraction and localization. To address these issues, three major improvements are incorporated into the baseline model. First, a novel C2f-EC module reconstructs the feature extraction backbone to jointly model local textures and global structural dependencies. Second, an ASAFI module is introduced to suppress background noise and actively redirect the network’s focus toward structurally sparse and minuscule defect regions. Finally, SGO-FPN, an advanced feature pyramid network, is designed to mitigate cross-scale spatial misalignment and leverage high-resolution features to preserve crucial fine-grained details for small object localization. The detailed principles and functionalities of these components are elaborated in Section 2.3, Section 2.4 and Section 2.5. The overall architecture of the enhanced model is shown in Figure 2.

2.3. C2f-EC

The original RT-DETR employs a ResNet18-based backbone constructed from stacked BasicBlock residual units to produce multi-scale feature maps. Although ResNet has been widely validated in general object detection tasks, its architecture presents notable shortcomings when directly applied to PCB defect detection. The BasicBlock structure relies on standard

3 \times 3

convolutions, whose limited receptive field restricts feature extraction to local spatial neighborhoods. This makes it challenging for the network to capture features over a wide range on the circuit board. Moreover, the standard residual shortcut merely performs element-wise addition of the input and output without explicit channel attention or gating, which hinders the effective separation of small defect features from cluttered backgrounds. The uniform use of ReLU activation and identical block layout across all stages further limits the backbone’s ability to learn increasingly expressive representations in deeper layers, which is unfavorable for differentiating defect categories with similar appearance.

We reconstruct the backbone in two steps to address these issues. The original ResNet18 backbone is first entirely replaced with CSPDarknet [29], which is built upon the CSP network design philosophy. CSPDarknet stacks multiple Conv and C2f modules in sequence, where the C2f module splits input features into two branches along the channel dimension, processes one branch through cascaded Bottleneck units, and then concatenates all branches for fusion. Compared with the BasicBlock-based ResNet18, CSPDarknet provides smoother gradient propagation and stronger multi-scale feature extraction with lower computational redundancy. Subsequently, to further enhance the feature representation of the backbone, we replace the standard Bottleneck units inside each C2f module with the proposed ECBlock, forming the proposed C2f-EC module. The ECBlock integrates the Efficient Local–Global Context Aggregation (ELGCA) mechanism from ELGC-Net [30] and the Convolutional Gated Linear Unit (CGLU) from TransNeXt [31]. Rather than a simple concatenation of modules, this specific combination is introduced to construct a targeted extract and filter mechanism that directly addresses the complex background interference and extreme defect scale variations unique to PCB imagery. Through this modification, each C2f-EC module inherits the efficient split-concatenation structure of CSP while equipping every internal processing unit with the ability of joint spatial-channel context modeling and adaptive gating. The improved backbone produces multi-scale feature maps at strides of

{4, 8, 16, 32}

for the subsequent hybrid encoder and decoder.

The structure of C2f-EC is shown in Figure 3. The input feature map first passes through a CBS layer and is then split into two parts along the channel dimension, denoted as

Y_{0}

and

Y_{1}

.

Y_{0}

is kept unchanged to preserve the original detail information, while

Y_{1}

is fed through n cascaded ECBlocks sequentially. Each ECBlock takes the output of the previous one as input and produces a refined feature map. After all ECBlocks finish processing,

Y_{0}

and all the ECBlock outputs are concatenated together and fused by a CBS layer to produce the module output Z as formulated in Equation (1):

Z = CBS (Concat [Y_{0}, Y_{1}, f_{1} (Y_{1}), f_{2} (f_{1} (Y_{1})), \dots, f_{n} (Y_{1})])

(1)

where

f_{i} (\cdot)

denotes the i-th ECBlock.

As the core computational unit, each ECBlock uses a dual-sublayer design with pre-normalization and residual shortcuts. The first sublayer applies Layer Normalization and the ELGCA attention mechanism, adding the result back to the input. The second sublayer normalizes this intermediate output and processes it through the CGLU feed-forward network, again followed by a residual connection. In this way, each ECBlock leverages ELGCA to capture spatial context and then applies CGLU to modulate channel-wise features, enabling joint spatial and channel refinement within a single block.

Inside the ELGCA, the input is split into two branches along the channel dimension. One branch extracts local spatial features using a

3 \times 3

depthwise convolution and GELU activation. The other branch applies a

1 \times 1

convolution, activates via GELU, and feeds into the Pooled Transposed (PT) Attention sub-module, where the features are divided into three components: V, K, and Q. The K and Q are compressed by Max Pooling and Average Pooling respectively to reduce the spatial size, and then together with V are fed into the Transposed Attention to capture global context via the operation defined in Equation (2):

A_{att} = V \times σ ({(K^{-})}^{T} \times Q^{-})

(2)

where

Q^{-}

and

K^{-}

denote the pooled query and key, and

σ (\cdot)

is

Softmax

. By transposing the multiplication order, the PT attention reduces the computational complexity from

O (N^{2})

to

O (N)

, where

N = H \times W

.

Finally, the outputs of both branches are concatenated together as the ELGCA output. Through this split-and-parallel design, the ELGCA can capture both local detail features and global context information at the same time, while keeping the computational cost low.

In the CGLU feed-forward sublayer, the normalized input is projected by a

1 \times 1

convolution to expand the channel dimension and split into two parallel branches: a value branch and a gating branch. The gating branch is processed through a

3 \times 3

depthwise convolution followed by GELU activation, which generates spatially-aware gating signals based on local neighborhood features. The value branch and the activated gating branch are combined through element-wise multiplication, allowing the network to adaptively control the information flow for each spatial position and each channel independently. The gated features are then projected back to the original channel dimension via a

1 \times 1

and a residual connection from the input is added following Equation (3):

ConvolutionalGLU (X) = X + {Conv}_{1 \times 1} (X_{v} ⊙ GELU ({DWConv}_{3 \times 3} (X_{g})))

(3)

where

X_{v}

and

X_{g}

represent the value branch and the gating branch, respectively, and ⊙ denotes element-wise multiplication. Through the depthwise convolution in the gating branch, the CGLU produces position-specific gating signals conditioned on local context, enabling the network to amplify discriminative defect-related activations while attenuating irrelevant responses.

Collectively, the proposed C2f-EC module reconstructs the backbone by replacing the original Bottleneck in C2f with the ECBlock. The theoretical rationale for sequentially coupling ELGCA and CGLU within this block stems from the need to balance comprehensive feature acquisition with strict noise suppression in PCB inspection. For complex circuit boards, capturing both the macroscopic repetitive wiring patterns and the microscopic defect details is crucial. The ELGCA acts as the initial spatial extractor, enriching the feature map with joint local and global spatial context. However, because this broad extraction naturally retains dense background noise, the CGLU immediately compensates by serving as a localized gate. It selectively filters these ELGCA-enriched features along the channel dimension, adaptively suppressing the responses of normal copper traces while highlighting the actual defect signatures. This spatial-then-channel refinement pipeline is repeated across multiple cascaded ECBlocks within the C2f split-concatenation structure, which further promotes feature reuse and gradient flow across blocks. The synergy of these components enables the reconstructed backbone to produce more discriminative multi-scale features for the subsequent hybrid encoder and RT-DETR decoder.

2.4. ASAFI

RT-DETR adopts an efficient hybrid encoder consisting of two components: the AIFI module and the CCFM. Among them, the AIFI module uses standard softmax-based multi-head self-attention (MHSA) to perform feature interaction within the highest-level feature map, allowing every spatial position to attend to all others for richer contextual information. However, the dense attention mechanism considers all query-key pair relationships indiscriminately, which inevitably introduces noisy interactions from spatially irrelevant regions. In PCB defect detection, defect regions typically occupy only a small fraction of the feature map, while the majority of spatial positions correspond to normal background. As a result, the attention output in AIFI is dominated by background noise, making it harder for the model to focus on the small defect areas that truly matter.

To address this issue, we propose the ASAFI module. As shown in Figure 4, ASAFI keeps the main structure of AIFI unchanged and introduces Adaptive Sparse Self-Attention (ASSA) [32] into the attention computation stage to help the model better distinguish defect regions from background.

The right side of Figure 4 illustrates the structure of ASSA. The input first passes through layer normalization and a linear projection layer to obtain three matrices: queries Q, keys K, and values V. Then, Q and K are multiplied to produce the attention score matrix S according to Equation (4):

S = \frac{Q K^{T}}{\sqrt{d}} + B

(4)

where d is the head dimension and B is a learnable relative positional bias.

In the original MHSA, S would be passed through a softmax function directly. In ASSA, however, S is sent into two parallel branches,

DSA

and

SSA

(Equation (5)):

DSA = Softmax (S), SSA = {ReLU}^{2} (S)

(5)

The

DSA

branch applies

Softmax

to keep all position interactions, ensuring that no information is lost. The

SSA

branch uses squared

ReLU

instead, which sets all negative scores to zero so that only the most relevant positions contribute to the output. ASSA then adaptively fuses the two branches through learnable weights

w_{1}

and

w_{2}

to produce the final attention output A, as shown in Equation (6):

A = (w_{1} \cdot SSA + w_{2} \cdot DSA) \cdot V

(6)

where

w_{1}

and

w_{2}

are normalized by softmax so that

w_{1} + w_{2} = 1

. These fusion weights are not shared across the network; rather, they are defined independently for each attention layer. Both weights start at 0.5 and are adjusted by the network during training, allowing each specific layer to dynamically find its own optimal balance between the two branches. The output A is then passed through a projection layer and added back to the input through a residual connection.

Overall, ASAFI brings clear benefits to PCB defect detection. The

SSA

branch suppresses noisy interactions from irrelevant background regions, which is particularly beneficial when defects are sparse and small. Meanwhile, the adaptive fusion mechanism prevents information loss caused by excessive sparsity, with the

DSA

branch serving as a safety net to preserve critical defect features.

2.5. SGO-FPN

The CCFM is responsible for fusing the multi-scale features extracted by the backbone through a top-down and bottom-up bidirectional pathway. While effective for general detection tasks, this design presents several shortcomings for small PCB defect detection. During top-down fusion, the high-level feature maps need to be enlarged to match the resolution of lower-level feature maps before they can be merged. The CCFM uses nearest neighbor interpolation for this enlargement, which merely duplicates adjacent pixel values. Since the spatial information lost during backbone down-sampling cannot be truly recovered by this simple duplication, the enlarged features are inevitably misaligned with the low-level features they are being fused with. In the bottom-up pathway, the convolutions with a stride of 2 used for down-sampling reduce the spatial resolution by half at each step, discarding part of the fine-grained details that are critical for localizing small defects on PCB surfaces. In addition, the CCFM only involves features from the P3, P4, and P5 levels, without leveraging the higher-resolution P2 features from earlier backbone stages, which contain richer spatial information for small targets.

Based on the above analysis, we redesign the CCFM and propose a new feature pyramid structure named SGO-FPN, whose architecture is illustrated in Figure 5. The SGO-FPN targets the identified shortcomings by improving the up-sampling strategy, introducing higher-resolution features into the fusion process, and enhancing the representational capacity of both the fusion and down-sampling modules.

The backbone produces four levels of feature maps

{S_{2}, S_{3}, S_{4}, S_{5}}

with strides of 4, 8, 16, and 32, respectively. After

S_{5}

is processed by the improved AIFI module to obtain

F_{5}

, the SGO-FPN begins its top-down fusion from this highest-level feature. In the original CCFM,

F_{5}

is enlarged by nearest neighbor interpolation and directly concatenated with

S_{4}

. As discussed above, this enlargement merely copies pixel values and cannot recover the spatial information that was lost during backbone down-sampling, leading to misaligned features at the fusion point. The SGO-FPN mitigates the negative impact of this misalignment by replacing the nearest neighbor interpolation with Soft Nearest Neighbor Interpolation (SNI) [33]. Beyond merely aligning cross-scale features, SNI acts as an adaptive filter. Given the densely routed and complex background traces on PCBs, this soft weighting mechanism is crucial. It effectively suppresses background noise from regular copper wiring while enhancing the network’s sensitivity to subtle defect patterns. The core idea of SNI is to scale down the enlarged high-level features by a factor inversely proportional to the square of the enlargement ratio before fusion using Equation (7):

Y = \frac{1}{s^{2}} \cdot f_{up} (X)

(7)

where

f_{up} (\cdot)

denotes the nearest neighbor interpolation and s is the enlargement factor. As s increases, the scaling factor

1 / s^{2}

becomes smaller, meaning the high-level features are given progressively less weight. This ensures that the low-level features with accurate spatial information dominate the fused result, while the high-level semantic features serve as a soft auxiliary signal rather than an equal-weight component. Notably, SNI introduces no additional parameters or computation, as it only involves a scalar multiplication after the standard interpolation. Using SNI,

F_{5}

is enlarged and concatenated with

S_{4}

after a

1 \times 1

CB module for channel alignment, then refined by a RepC3 block. The RepC3 module is explicitly incorporated at these fusion nodes to address the strict real-time constraints of industrial PCB inspection. By leveraging structural reparameterization, RepC3 enriches feature representation during training and converts to a highly streamlined, hardware-friendly topology during inference without sacrificing accuracy. Its internal structure is shown in Figure 6, producing the intermediate feature

Y_{4}

.

Next,

Y_{4}

is similarly enlarged by SNI and propagated down to the

P_{3}

level. Here, the SGO-FPN makes a significant change compared to the original CCFM. In the original design, only

S_{3}

is fused with the top-down features at this level. However, the

P_{2}

feature map

S_{2}

from the backbone has a resolution twice that of

S_{3}

and contains richer spatial details of small defects. Discarding

S_{2}

means losing valuable information for small target detection. To incorporate

S_{2}

, its resolution needs to be reduced to match

S_{3}

. Conventional methods such as strided convolution or pooling would lose pixels in this process, which is severely detrimental when detecting microscopic PCB anomalies like tiny mouse bites or spurs. To rigorously preserve these fine-grained spatial details, the SGO-FPN instead uses Space-to-Depth Convolution (SPDConv) [34] to handle this down-sampling task. The structure of SPDConv is shown in Figure 7, and it works in two steps. First, the feature map

S_{2} \in R^{C \times H \times W}

is divided into four sub-maps by sampling even and odd rows and columns separately, and these sub-maps are stacked along the channel dimension to form an intermediate feature

{\hat{S}}_{2} \in R^{4 C \times \frac{H}{2} \times \frac{W}{2}}

. This halves the spatial resolution while quadrupling the number of channels, without discarding any pixel. Then, a non-strided

3 \times 3

convolution compresses the expanded channels as described in Equation (8):

S_{2}^{'} = W_{spd} * {\hat{S}}_{2}

(8)

where

W_{spd}

denotes the learnable weight of the

3 \times 3

convolution and ∗ represents the convolution operation.

In this way,

S_{2}

is down-sampled to the same resolution as

S_{3}

while retaining all its original pixel information. The resulting

S_{2}^{'}

is then concatenated with the SNI-enlarged

Y_{4}

and the backbone feature

S_{3}

at the

P_{3}

level, forming a three-source fusion. Compared to the original two-source fusion in CCFM, this design additionally introduces fine-grained spatial details from the earliest backbone stage, which is beneficial for small defect detection.

After concatenation, the fused features come from three different sources and cover a wide range of scales. A simple convolution block is not sufficient to fully exploit this rich information, particularly because PCB defects exhibit extreme morphological variations. To accommodate such diverse defect shapes, the SGO-FPN introduces the CSPOmniKernel module [35] at the

P_{3}

level. By dynamically extracting multi-scale spatial context, this module allows the network to adaptively adjust its receptive field to match the specific scale and geometry of the target defect. The detailed internal structure of this module and its feature transformation process are explicitly illustrated in Figure 8. This module first uses a

1 \times 1

CBS module to project the input

Z \in R^{C \times H \times W}

, then splits the channels into two groups: a small portion of

25 %

is sent to the OmniKernel module for intensive processing, while the remaining

75 %

bypasses it directly. This split design keeps the computational cost low, since only a quarter of the channels go through the heavy processing path.

The OmniKernel module first projects the input through a

1 \times 1

convolution, followed by GELU activation, then processes the result through three parallel branches. The local branch applies a

1 \times 1

depth-wise convolution to capture fine-grained point-wise features. The large-kernel branch uses three parallel depth-wise convolutions with kernel sizes of

31 \times 1

,

31 \times 31

, and

1 \times 31

to capture vertical, square, and horizontal contextual patterns, respectively, and their outputs are summed together. The global branch sequentially applies a Dual-domain Channel Attention Module (DCAM) and a Frequency–Spatial Attention Module (FSAM) to model long-range dependencies across both frequency and spatial domains. The outputs of all three branches are added together with the input through a residual connection, and the result is projected by a

1 \times 1

convolution to produce the final output. Denoting the input after GELU as

Z_{in}

, the outputs of the three parallel branches are defined according to Equations (9)–(11):

Z_{local} = Z_{in} * w_{1 \times 1}

(9)

Z_{large} = Z_{in} * w_{31 \times 1} + Z_{in} * w_{31 \times 31} + Z_{in} * w_{1 \times 31}

(10)

Z_{global} = FSAM (DCAM (Z_{in}))

(11)

The final OmniKernel output is obtained as formulated in Equation (12):

Z_{out} = W_{out} * σ (Z_{in} + Z_{local} + Z_{large} + Z_{global})

(12)

where ∗ denotes convolution, w denotes the corresponding depth-wise convolution kernel weights,

σ

is the ReLU activation, and

W_{out}

is the

1 \times 1

output convolution. The resulting

Z_{out}

from the OmniKernel module and the identity branch are then concatenated along the channel dimension and merged by a

1 \times 1

CBS module to restore the original channel number. A RepC3 block is applied afterwards, producing the

P_{3}

-level output

P_{3}

.

With

P_{3}

generated, the SGO-FPN proceeds to the bottom-up pathway to produce

P_{4}

and

P_{5}

. The original CCFM uses standard

3 \times 3

convolutions with stride 2 to down-sample features at each step, which only provide one type of receptive field. The SGO-FPN replaces them with GSConvE [33], whose structure is shown in Figure 9, which produces more diverse features in a lightweight manner. A standard strided convolution first generates a feature map

F_{1}

with half the output channels. Then

F_{1}

is further processed by a

3 \times 3

convolution followed by a

3 \times 3

depth-wise convolution with GELU activation, producing a second feature map

F_{2}

with richer texture information following the operations in Equation (13):

F_{1} = W_{k} * F, F_{2} = σ_{GELU} (W_{d} * (W_{a} * F_{1}))

(13)

The two halves are concatenated and then channel-shuffled, where the channels from

F_{1}

and

F_{2}

are interleaved so that adjacent channels in the output come from different branches. This interleaving encourages information exchange between the two branches and yields a more expressive representation than a single strided convolution. Using GSConvE for down-sampling,

P_{3}

is fused with

Y_{4}

to produce

P_{4}

, and

P_{4}

is fused with

F_{5}

to produce

P_{5}

. The resulting

{P_{3}, P_{4}, P_{5}}

are the final multi-scale features sent to the RT-DETR decoder for detection. Through the above redesign, the SGO-FPN improves the feature pyramid’s ability to preserve spatial details, align cross-scale features, and capture multi-scale context, which are essential for detecting small and densely distributed PCB defects.

3. Experimental Results and Analysis

3.1. Dataset

To more authentically simulate complex and variable printed circuit board defect detection scenarios and comprehensively evaluate model performance, this study constructs a multi-source fused dataset. This dataset integrates two representative open-source datasets. The first is provided by the Intelligent Robotics Open Laboratory at Peking University, available at http://robotics.pkusz.edu.cn/resources/dataset/ (accessed on 10 March 2026), containing 693 images and covering six common defects: short, open circuit, mouse bite, spur, missing hole, and spurious copper. The second is a public dataset from the Roboflow platform, available at https://universe.roboflow.com/dataset-8tmiu/normal-rotated-scratch-solder (accessed on 10 March 2026), comprising 1848 images, originally divided into 1284 for training, 377 for validation, and 187 for testing. This second dataset introduces a scratch defect category in addition to the aforementioned types. A rigorous screening process was conducted on the combined 2541 images to ensure the high quality of the training data and prevent the model from learning ambiguous or redundant features. To eliminate potential selection bias, we established strict exclusion criteria: (1) images with severe camera defocus or motion blur were discarded; (2) samples with extreme overexposure or glare that completely obscured the underlying PCB trace details were removed; and (3) highly duplicate frames capturing the exact same defect instance from identical viewpoints were filtered out to avoid data leakage. As a result of this careful selection and integration procedure, a final set of 2045 images was extracted to form the initial dataset.

To effectively enhance the generalization ability and robustness of the model while suppressing overfitting, this study implements a series of data augmentation strategies on the training data. Specific operations include horizontal and vertical flipping, random rotation, brightness adjustment, and exposure adjustment. Following augmentation, the final dataset expands to 3132 images, which are randomly divided into training, validation, and testing sets at a strict ratio of 8 to 1 to 1.

The detailed description of this dataset is shown in Figure 10. As shown in Figure 10a, the number of instances across various defect categories is relatively balanced. This equilibrium helps the model learn different defect features more smoothly during training, avoiding prediction bias caused by class imbalance. Furthermore, the normalized width and height distribution of the defect targets is illustrated in Figure 10b, where the high-density area is tightly concentrated near the coordinate origin. In other words, the bounding box dimensions of the vast majority of defect targets are less than 0.01 times the original image size, with their area proportion falling under 1%. Such a dense clustering near the origin underscores that the dataset is replete with an abundance of minuscule defect targets, posing a severe challenge to detection algorithms regarding small target feature extraction and localization.

3.2. Experimental Environment

During the training process, the input image size is set to

640 \times 640

, the batch size is set to 8, and the total number of training epochs is set to 300. In this study, the experiments are conducted based on the RT-DETR framework. The model is trained with an initial learning rate of 0.001 and a weight decay of 0.0001. The detailed hardware and software configurations for the experimental environment are summarized in Table 1.

3.3. Evaluation Metrics

To rigorously assess the detection accuracy of the proposed model, we employ Precision (P), Recall (R), F1-Score (

F_{1}

), and mAP at IoU thresholds of 0.5 and 0.5–0.95, denoted as mAP@0.5 and mAP@0.5–0.95, correspondingly. The mathematical formulations for these accuracy metrics are defined in Equations (14)–(18):

P = \frac{TP}{TP + FP}

(14)

R = \frac{TP}{TP + FN}

(15)

F_{1} = \frac{2 \cdot P \cdot R}{P + R}

(16)

AP = \int_{0}^{1} P (R) d R

(17)

mAP = \frac{1}{n} \sum_{i = 1}^{n} {AP}_{i}

(18)

where

TP

,

FP

, and

FN

denote the counts of true positives, false positives, and false negatives, respectively. The

F_{1}

is defined as the harmonic mean of P and R. Average Precision (

AP

) evaluates the detection efficacy for a specific category by calculating the area under the P-R curve. Consequently,

mAP

represents the arithmetic mean of

AP

across all n classes, providing a holistic measure of the model’s multi-class detection capability.

Furthermore, to evaluate the model’s computational complexity and lightweight characteristics, we utilize the total number of parameters (Params) and Giga Floating-point Operations (GFLOPs). Specifically, Params reflects the spatial complexity and memory footprint of the network, which directly determines its deployment feasibility on resource-constrained hardware. Concurrently, GFLOPs quantifies the theoretical computational cost and time complexity, offering a standardized measure of the processing requirements during inference. Together, these two metrics comprehensively validate the model’s efficiency and its suitability for practical applications.

3.4. Comparative Experiments

3.4.1. Comparison with Baseline Models

To comprehensively evaluate the proposed EAS-DETR in the PCB defect detection task, comparative experiments were conducted against a broad spectrum of mainstream detection algorithms. The compared methods encompass the classical two-stage detector Faster R-CNN, single-stage YOLO-series detectors spanning multiple generations and scales including YOLOv5n/s/m, YOLOv8n/s/m, YOLOv9s/m, and YOLO11n/s/m, Transformer-based end-to-end detectors RTDETR-r18 and RTDETR-r34, as well as recently proposed improved algorithms FFCA-YOLO [36] and VRF-RTDETR [37]. All models were trained and evaluated on the same PCB defect dataset. P, R,

F_{1}

, mAP@0.5, mAP@0.5–0.95, Params, and GFLOPs were adopted as evaluation metrics. The quantitative results are reported in Table 2.

As shown in Table 2, the proposed EAS-DETR achieves the highest mAP@0.5 of 93.0% and mAP@0.5–0.95 of 44.1% among all compared models. Its R reaches 91.9%, also ranking first, which reflects the lowest miss rate and is critical for industrial quality assurance. Although the P of EAS-DETR is 91.4%, slightly lower than YOLOv8m and FFCA-YOLO, this marginal gap is well compensated by significant advantages in R and mAP, confirming that EAS-DETR strikes a superior balance between detecting more defects and maintaining localization accuracy.

Faster R-CNN, as a representative two-stage detector, achieves acceptable mAP@0.5 but falls notably behind EAS-DETR in R, with a gap of 7.0 percentage points (pp). Meanwhile, its Params and GFLOPs are approximately 2.8 and 2.3 times those of EAS-DETR, respectively. The heavy computational overhead, combined with a higher miss rate, limits its suitability for real-time industrial PCB inspection, where both efficiency and sensitivity to subtle defects are essential.

Within the YOLO family, YOLOv8m and YOLOv9m represent the strongest competitors in terms of mAP@0.5, reaching 92.5% and 92.6%, respectively. Yet, both demand considerably more Params and GFLOPs, making them less efficient under equivalent accuracy requirements. On the other end of the spectrum, lightweight variants such as YOLOv5n, YOLOv8n, and YOLO11n exhibit mAP@0.5–0.95 values consistently below 42% and R values below 80%. YOLO11n, in particular, achieves only 76.8% in R, the lowest among all compared models, pointing to severe miss detection issues. Simply scaling down model architectures proves insufficient for capturing the diverse and fine-grained defect morphologies encountered in PCB inspection.

RTDETR-r18 delivers a moderate baseline across most metrics but lags behind EAS-DETR in every category. RTDETR-r34, despite its substantially larger Params of 31.1M and GFLOPs of 88.8, achieves the lowest mAP@0.5–0.95 of 34.6% among all methods. This significant performance degradation under stricter IoU thresholds exposes the limited localization precision of the original RTDETR backbone, a limitation that the proposed architectural improvements effectively mitigate.

Among recently proposed improved algorithms, FFCA-YOLO stands out with the highest P of 92.6% and an extremely compact design of only 2.3M Params. However, its R and mAP@0.5 remain inferior to EAS-DETR, indicating that its aggressive compression sacrifices defect sensitivity. VRF-RTDETR achieves competitive

F_{1}

that closely matches EAS-DETR, yet its mAP@0.5 and mAP@0.5–0.95 are 1.8 and 0.8 pp lower, respectively, leaving a clear accuracy gap in comprehensive evaluation.

Beyond the standard holdout validation reported in Table 2, we further conducted a 5-fold cross-validation to rule out the possibility of performance gains arising from a favorable random dataset partition. Given the substantial computational cost of retraining all comparative object detectors, we adopted a targeted approach, applying this cross-validation specifically to the primary baseline RTDETR-r18 and our proposed EAS-DETR. The detailed metrics across these five distinct data folds, encompassing P, R,

F_{1}

, mAP@0.5, and mAP@0.5–0.95, along with their respective mean and standard deviation (SD), are comprehensively summarized in Table 3.

As detailed in Table 3, the cross-validation results demonstrate that EAS-DETR consistently achieves a higher average performance than the baseline across all five metrics. Even our lowest recorded fold surpasses the baseline’s best attempt. Furthermore, our model exhibits a noticeably tighter SD across all folds, cutting the mAP@0.5 fluctuation from

\pm 0.24

in the baseline down to

\pm 0.11

. This consistent reduction in variance across entirely different data partitions confirms that the structural modifications in EAS-DETR successfully stabilize both feature extraction and the overall training process. These results firmly verify that our high detection accuracy is statistically robust, highly generalizable, and independent of any specific dataset split.

Having established both the accuracy and the statistical reliability of our method, to provide a more intuitive comparison, representative detection results for typical defect categories are visualized in Figure 11. Each row corresponds to a specific defect type. From left to right, the columns display the Ground Truth, followed by the detection results of RTDETR-r18, RTDETR-r34, YOLOv5m, YOLO11m, and our proposed EAS-DETR.

Figure 11a,b present scratch and excess solder defects captured under dim lighting conditions, which reduce contrast and make subtle defects harder to distinguish from the background. In Figure 11a, EAS-DETR is the only model that successfully identifies a relatively short scratch in the lower-right corner, whereas all other models miss this target. In Figure 11b, YOLO11m and EAS-DETR achieve complete detection, correctly recognizing the excess solder target in the upper-right corner that the remaining models overlook. The stronger performance under low-illumination conditions reflects that EAS-DETR is more capable of capturing fine-grained defect details that are easily submerged in dark backgrounds.

Figure 11c shows short defects under strong exposure, where overexposure washes out fine defect boundaries and increases the difficulty of accurate localization. Here, EAS-DETR avoids miss detection entirely, while YOLOv5m misses two targets and the remaining models also fail to detect certain defects. Figure 11d depicts mouse bite defects against an extremely complex background with dense circuit traces, which proves the most challenging scenario among all groups. All compared models suffer from considerable miss detections, and YOLO11m even fails to detect any defect in the image, whereas EAS-DETR maintains complete detection. The robustness under both overexposure and complex background conditions can be attributed to the enhanced global semantic modeling and multi-scale feature fusion in EAS-DETR, which help suppress background interference while preserving defect-related features.

Figure 11e,f present missing hole and spur defects on PCBs with relatively normal backgrounds. Under these less demanding conditions, EAS-DETR and RTDETR-r34 both achieve complete detection, while the other three models still exhibit miss detections. Even in standard imaging scenarios, the compared YOLO-series models and RTDETR-r18 lack sufficient sensitivity to certain defect types, whereas EAS-DETR consistently maintains reliable detection.

Taken together, the visualization results align well with the quantitative analysis. Across all defect categories and under varying imaging conditions including dim lighting, strong exposure, and complex backgrounds, EAS-DETR remains the only model free from miss detection. More importantly, the gap between EAS-DETR and other models becomes most pronounced in the more challenging cases, such as low contrast and cluttered backgrounds, where conventional detectors tend to fail. This suggests that the improvements introduced in the backbone, attention mechanism, and feature pyramid are not merely incremental but address fundamental limitations in handling difficult detection scenarios common in real-world PCB inspection.

3.4.2. Generalization Performance Across Different Datasets

To comprehensively evaluate the generalization capability and robustness of the proposed EAS-DETR, we conduct extensive validation across multiple benchmark datasets. First, we evaluate the model on the widely cited DeepPCB dataset [38] to guarantee its reliable generalization across different industrial manufacturing environments. Beyond this domain-specific assessment, we extend our evaluation to two distinct benchmarks: AI-TOD [39] and VisDrone-DET [40]. While the primary objective of EAS-DETR is optimized for PCB defect detection, evaluating it on distinct imaging domains is crucial to verify that the architectural improvements do not overfit to a specific domain. AI-TOD is a benchmark explicitly designed for tiny object detection in remote sensing scenes, featuring an extreme scale challenge with a mean absolute object size of approximately 12.8 pixels. Complementing this scale-focused evaluation, VisDrone-DET introduces real-world complexity through diverse urban scenes captured at varying drone altitudes, presenting compounding challenges such as arbitrary viewpoints, dense crowds, and heavy occlusion.

The comparative results between the baseline RTDETR-r18 and the proposed EAS-DETR are detailed in Table 4. Across all four evaluated datasets, EAS-DETR consistently exhibits superior detection capabilities. On the primary PCB dataset, our model achieves a robust mAP@0.5 of 93.0% and an mAP@0.5–0.95 of 44.1%, surpassing the baseline by 4.9% and 4.4%, respectively. When extending the evaluation to the DeepPCB dataset, EAS-DETR maintains its superiority. It pushes the mAP@0.5 to 98.5% and achieves a 5.9 pp increase in the mAP@0.5–0.95 metric to reach 73.0%. This significant boost on an independent PCB dataset firmly validates that our proposed modules extract generalized defect features rather than memorizing a specific training set.

The architectural advantages of EAS-DETR become particularly pronounced when addressing extremely small objects. On the AI-TOD benchmark, the model achieves an mAP@0.5 of 93.4%, yielding a 5.8% absolute improvement over RTDETR-r18. More notably, the stringent mAP@0.5–0.95 metric surges from 37.5% to 49.3%. Such a substantial margin underscores the architecture’s capacity to preserve fine-grained spatial information, empowering the network to localize tiny targets that contain very few discriminative pixels.

Furthermore, the VisDrone-DET dataset rigorously tests a model’s resilience against scale variation and severe occlusion. Under these conditions, the baseline RTDETR-r18 struggles with an mAP@0.5 of only 34.9%. EAS-DETR effectively overcomes this bottleneck, boosting the mAP@0.5 to 42.4% and driving the R up by nearly 10 pp. The enhanced feature extraction and integration mechanisms within our design naturally filter out complex background clutter and maintain high sensitivity to overlapping targets. Ultimately, the consistent performance gains across the PCB-specific DeepPCB benchmark, the highly controlled tiny-object domain of AI-TOD, and the complex real-world scenes of VisDrone-DET firmly demonstrate the strong generalizability and environmental adaptability of the proposed EAS-DETR framework.

3.5. Ablation Experiment

Compared with the original RT-DETR, our improved model introduces enhancements into the backbone and the hybrid encoder. Specifically, C2f-EC is proposed to reconstruct the backbone by replacing the ResNet18 residual units with ECBlock-embedded CSP modules, ASAFI is designed to improve the intra-scale feature interaction module, and SGO-FPN is developed to enhance the cross-scale feature fusion module. To comprehensively verify the effectiveness of these improvements, we design eight sets of ablation experiments as follows:

1.: The original RT-DETR network as baseline.
2.: Baseline combined with C2f-EC.
3.: Baseline combined with ASAFI.
4.: Baseline combined with SGO-FPN.
5.: Baseline combined with C2f-EC and ASAFI.
6.: Baseline combined with C2f-EC and SGO-FPN.
7.: Baseline combined with ASAFI and SGO-FPN.
8.: The complete EAS-DETR integrating all three improvements.

The experimental environment remains consistent with the description provided in Section 3.2, and the detailed results are presented in Table 5.

When C2f-EC is individually introduced into the backbone, the mAP@0.5 increases from 88.1% to 90.1% and P rises from 89.6% to 91.1%. It is worth noting that the Params simultaneously drops from 19.9M to 12.8M, a reduction of 35.7%, and GFLOPs decrease from 57.0 to 42.9, down by 24.7%, which means the reconstructed backbone achieves better feature extraction capability with a more compact structure.

Replacing only the intra-scale feature interaction module with ASAFI brings the mAP@0.5 to 90.6%, a 2.5 pp gain over the baseline. This gain suggests that the modified attention mechanism helps the encoder better distinguish defect regions from dominant background on the feature map, directly benefiting the localization of fine-grained PCB defects.

SGO-FPN applied alone to the cross-scale feature fusion module produces the largest single-module improvement, with mAP@0.5 reaching 91.3%, an increase of 3.2 pp, and mAP@0.5–0.95 reaching 42.7%, an increase of 3.0 pp. The considerable accuracy gain confirms that the redesigned fusion path better preserves spatial detail that is critical for detecting small-sized defects. While adding SGO-FPN alone inevitably increases the theoretical computational complexity to 64.6 GFLOPs and the parameter count to 20.2M, its standalone inference speed remains highly efficient at 90.74 FPS, proving that its structural complexity does not create severe latency bottlenecks.

Models 5–7 further evaluate the pairwise combinations of the proposed modules. C2f-EC with ASAFI achieves the highest mAP@0.5–0.95 of 43.1% among all two-module settings, as the richer backbone features allow the sparse attention to work more effectively across different IoU thresholds. A favorable accuracy-efficiency trade-off is observed in C2f-EC with SGO-FPN, which reaches 91.1% mAP@0.5 at only 13.8M Params. Notably, ASAFI with SGO-FPN yields 90.4% mAP@0.5, slightly below Model 5 and 6, which suggests that the two encoder-side improvements rely on a sufficiently strong backbone to fully realize their potential.

The complete EAS-DETR integrating all three modules achieves the best overall performance. P, R, and

F_{1}

reach 91.4%, 91.9%, and 91.5%, improving performance over that of the baseline by 1.8%, 2.8%, and 2.2%, respectively. The mAP@0.5 reaches 93.0%, surpassing the baseline by 4.9 pp, and mAP@0.5–0.95 reaches 44.1%, an improvement of 4.4 pp, while the model size is reduced to 14.6M Params, a 26.6% decrease from the baseline, with a moderate computational cost of 59.0 GFLOPs and a real-time inference speed of 70.05 FPS. As shown in Figure 12, EAS-DETR is located in the upper-left region of the scatter plot, achieving the highest mAP@0.5 with a relatively small Params. At the system level, the computational burden introduced by SGO-FPN and ASAFI is successfully balanced by the lightweight C2f-EC backbone. The final processing speed exceeds standard industrial real-time thresholds. These results confirm that the three proposed improvements complement each other effectively, and their combination yields the optimal detection performance for PCB defect inspection without introducing excessive computational burden or compromising practical deployment latency.

To further illustrate how each module affects the model’s attention, we visualize the heatmaps of the ablation models in Figure 13. In the baseline heatmaps (b), the model predominantly focuses on irrelevant regions, resulting in diffuse and scattered attention across the background rather than the actual defects. After introducing C2f-EC (c), which reconstructs the backbone with joint local–global context aggregation, the situation improves; the activation becomes more concentrated toward the defect locations. However, the focus remains incomplete and still exhibits misguided attention on non-defect areas. When ASAFI is further added (d) to enhance intra-scale feature interaction with adaptive sparse attention, the model successfully covers the complete defect regions, though the activation intensity remains relatively weak and less prominent. Finally, in the complete EAS-DETR (e), with SGO-FPN further improving cross-scale feature fusion through soft interpolation and higher-resolution feature integration, the heatmaps demonstrate complete and highly prominent activation. The attention is intensely focused on the defect regions with minimal response in irrelevant areas.

The observation indicates that as the proposed improvements are progressively applied, the model exhibits increasingly stronger attention toward the defect regions on the PCB surface, which effectively supports the capability of EAS-DETR in achieving high-accuracy defect detection.

4. Conclusions

This study addresses the critical challenges of tiny defect localization and severe background interference in automated PCB inspection by proposing EAS-DETR, a highly sensitive and lightweight real-time detector. Rather than relying on standard convolutional architectures or dense attention mechanisms, our approach strategically integrates local–global context aggregation, adaptive sparse attention, and high-resolution feature preservation to overcome the limitations of existing methods. Experimental results validate the superiority of EAS-DETR, achieving a state-of-the-art mAP@0.5 of 93.0%. More importantly, the proposed model effectively eliminates missed detections across challenging industrial scenarios, including dim lighting, overexposure, and complex circuit clutter, achieving an outstanding R of 91.9%. Notably, this robust performance is attained with a compact model size of 14.6M Params, striking an optimal balance between detection accuracy and computational efficiency compared to mainstream YOLO-series models. The extensive cross-domain validations on DeepPCB, AI-TOD and VisDrone-DET benchmarks further demonstrate the architecture’s exceptional versatility in broader tiny object detection applications. Moving forward, future work will focus on deploying EAS-DETR on resource-constrained industrial edge devices and exploring semi-supervised learning strategies to handle the long-tailed distribution of defects in real-world manufacturing lines.

Author Contributions

Conceptualization, Y.Y. and R.W.; methodology, Y.Y.; software, Y.Y. and R.W.; validation, Y.Y.; formal analysis, Y.Y.; resources, J.R.; data curation, Y.Y. and R.W.; writing—original draft preparation, Y.Y.; writing—review and editing, Y.Y., J.R. and R.W.; visualization, Y.Y.; supervision, J.R.; project administration, J.R.; funding acquisition, J.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Project of Hangzhou Natural Science Foundation, grant number 2025SZRJJ0703; the Zhejiang Province “Jianbing” R&D Tackling Plan Project, grant number 2023C01002; and the China National Textile and Apparel Council Science and Technology Guidance Plan Project, grant number 2025012.

Data Availability Statement

The data that support the findings of this study are derived from two publicly available datasets. The printed circuit board defect dataset from the Intelligent Robotics Open Laboratory at Peking University is available at http://robotics.pkusz.edu.cn/resources/dataset/ (accessed on 10 March 2026). The supplementary defect dataset is openly available on the Roboflow platform at https://universe.roboflow.com/dataset-8tmiu/normal-rotated-scratch-solder (accessed on 10 March 2026). The multi-source fused dataset generated and analyzed during the current study is available from the corresponding author upon reasonable request. Furthermore, the complete source code for the proposed EAS-DETR, including training and evaluation scripts, is publicly available on GitHub at https://github.com/Starfish0117/Starfish0117.github.io (accessed on 10 March 2026).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; the collection, analyses, or interpretation of data; the writing of the manuscript; or the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

PCB	Printed Circuit Board
HDI	High-Density Interconnect
AOI	Automated Optical Inspection
CNNs	Convolutional Neural Networks
YOLO	You Only Look Once
DETR	Detection Transformer
RT-DETR	Real-Time Detection Transformer
IoU	Intersection over Union
NMS	Non-Maximum Suppression
CSP	Cross Stage Partial
CGLU	Convolutional Gated Linear Unit
ELGCA	Efficient Local–Global Context Aggregation
AIFI	Attention-based Intra-scale Feature Interaction
CCFM	Cross-scale Feature Fusion Module
MHSA	Multi-Head Self-Attention
ASAFI	Adaptive Sparse Attention-based Intra-scale Feature Interaction
ASSA	Adaptive Sparse Self-Attention
SNI	Soft Nearest Neighbor Interpolation
SPDConv	Space-to-Depth Convolution
DCAM	Dual-domain Channel Attention Module
FSAM	Frequency–Spatial Attention Module
mAP	mean Average Precision
Params	Parameters
GFLOPs	Giga Floating-point Operations
SD	Standard Deviation

References

Papamatthaiou, S.; Menelaou, P.; El Achab Oussallam, B.; Moschou, D. Recent advances in bio-microsystem integration and Lab-on-PCB technology. Microsyst. Nanoeng. 2025, 11, 78. [Google Scholar] [CrossRef] [PubMed]
Trautweiler, S.; Dietrich, S. Advancing IC substrate manufacturing: Overcoming challenges and exploring opportunities with 10 μm line/space technology. In Proceedings of the 25th European Microelectronics and Packaging Conference & Exhibition (EMPC), Grenoble, France, 16–18 September 2025; pp. 1–6. [Google Scholar] [CrossRef]
Sankar, V.U.; Lakshmi, G.; Sankar, Y.S. A review of various defects in PCB. J. Electron. Test. 2022, 38, 481–491. [Google Scholar] [CrossRef]
Heltzel, S.; Cauwe, M.; Bennett, J.; Rohr, T. Advanced PCB technologies for space and their assessment using up-to-date standards. CEAS Space J. 2023, 15, 89–100. [Google Scholar] [CrossRef]
Ling, Q.; Isa, N.A.M. Printed circuit board defect detection methods based on image processing, machine learning and deep learning: A survey. IEEE Access 2023, 11, 15921–15944. [Google Scholar] [CrossRef]
Wu, F.; Zhang, X.; Kuan, Y.; He, Z. An AOI algorithm for PCB based on feature extraction. In Proceedings of the 7th World Congress on Intelligent Control and Automation, Chongqing, China, 25–27 June 2008; pp. 240–247. [Google Scholar] [CrossRef]
Dai, W.; Mujeeb, A.; Erdt, M.; Sourin, A. Soldering defect detection in automatic optical inspection. Adv. Eng. Inform. 2019, 43, 101004. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Li, Y.T.; Guo, J.I. A VGG-16 based Faster RCNN model for PCB error inspection in industrial AOI applications. In Proceedings of the 2018 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), Taichung, Taiwan, 19–21 May 2018; pp. 1–2. [Google Scholar] [CrossRef]
Hu, B.; Wang, J. Detection of PCB surface defects with improved Faster-RCNN and feature pyramid network. IEEE Access 2020, 8, 108335–108345. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Computer Vision—ECCV 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2025; Volume 15089. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.Y.; Berg, A.C. SSD: Single shot multiBox detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. [Google Scholar] [CrossRef]
Wang, J.; Xie, X.; Liu, G.; Wu, L. A lightweight PCB defect detection algorithm based on improved YOLOv8-PCB. Symmetry 2025, 17, 309. [Google Scholar] [CrossRef]
Liu, L.J.; Zhang, Y.; Karimi, H.R. Defect detection of printed circuit board surface based on an improved YOLOv8 with FasterNet backbone algorithms. Signal Image Video Process. 2025, 19, 89. [Google Scholar] [CrossRef]
Huang, J.; Zhao, F.; Chen, L. Defect detection network in PCB circuit devices based on GAN enhanced YOLOv11. Appl. Comput. Eng. 2025, 133, 128–134. [Google Scholar] [CrossRef]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12346. [Google Scholar] [CrossRef]
Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable transformers for end-to-end object detection. In Proceedings of the 9th International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
Li, F.; Zhang, H.; Liu, S.; Guo, J.; Ni, L.M.; Zhang, L. DN-DETR: Accelerate DETR training by introducing query denoising. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 2239–2251. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.Y. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In Proceedings of the 11th International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Lv, W.; Xu, S.; Zhao, Y.; Wang, G.; Wei, J.; Cui, C.; Du, Y.; Dang, Q.; Liu, Y. DETRs beat YOLOs on real-time object detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar] [CrossRef]
Peng, J.; Fan, W.; Lan, S.; Wang, D. MDD-DETR: Lightweight detection algorithm for printed circuit board minor defects. Electronics 2024, 13, 4453. [Google Scholar] [CrossRef]
Ji, L.; Huang, C.; Li, H.; Han, W.; Yi, L. MS-DETR: A real-time multi-scale detection transformer for PCB defect detection. Signal Image Video Process. 2025, 19, 203. [Google Scholar] [CrossRef]
Madan, M.; Reich, C. Strengthening small object detection in adapted RT-DETR through robust enhancements. Electronics 2025, 14, 3830. [Google Scholar] [CrossRef]
Zhang, D.H.; Hao, X.Y.; Liang, L.L.; Zhang, C.; Peng, Y. A novel deep convolutional neural network algorithm for surface defect detection. J. Comput. Des. Eng. 2022, 9, 1616–1632. [Google Scholar] [CrossRef]
Noman, M.; Fiaz, M.; Cholakkal, H.; Khan, S.; Khan, F.S. ELGC-Net: Efficient local–global context aggregation for remote sensing change detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4701611. [Google Scholar] [CrossRef]
Shi, D. TransNeXt: Robust foveal visual perception for vision transformers. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 17773–17783. [Google Scholar] [CrossRef]
Zhou, S.; Chen, D.; Pan, J.; Shi, J.; Yang, J. Adapt or perish: Adaptive sparse transformer with attentive feature refinement for image restoration. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 2952–2963. [Google Scholar] [CrossRef]
Li, H. Rethinking features-fused-pyramid-neck for object detection. In Computer Vision—ECCV 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2025; Volume 15125. [Google Scholar] [CrossRef]
Sunkara, R.; Luo, T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects. In Machine Learning and Knowledge Discovery in Databases; Amini, M.R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2023; Volume 13715. [Google Scholar] [CrossRef]
Cui, Y.; Ren, W.; Knoll, A. Omni-kernel network for image restoration. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 1426–1434. [Google Scholar] [CrossRef]
Zhang, Y.; Ye, M.; Zhu, G.; Liu, Y.; Guo, P.; Yan, J. FFCA-YOLO for small object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5611215. [Google Scholar] [CrossRef]
Liu, W.; Shi, L.; An, G. An efficient aerial image detection with variable receptive fields. Remote Sens. 2025, 17, 2672. [Google Scholar] [CrossRef]
He, F.; Tang, S.; Mehrkanoon, S.; Huang, X.; Yang, J. A Real-time PCB Defect Detector Based on Supervised and Semi-supervised Learning. In Proceedings of the 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2020), Bruges, Belgium, 2–4 October 2020; pp. 527–532. [Google Scholar]
Wang, J.; Yang, W.; Guo, H.; Zhang, R.; Xia, G.S. Tiny object detection in aerial images. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 3791–3798. [Google Scholar] [CrossRef]
Zhu, P.; Wen, L.; Du, D.; Bian, X.; Fan, H.; Hu, Q.; Ling, H. Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7380–7399. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The architectural structure of the RT-DETR Baseline model.

Figure 2. Overall architecture of EAS-DETR.

Figure 3. Schematic diagram of the proposed C2f-EC module.

Figure 4. Detailed structure of the proposed ASAFI module.

Figure 5. Architecture of the proposed SGO-FPN.

Figure 6. Internal design of the RepC3 block.

Figure 7. Illustration of the Space-to-Depth convolution operation.

Figure 8. Structural details of the CSPOmniKernel module. It is integrated at the

P_{3}

level of the SGO-FPN to effectively extract and fuse multi-scale contextual information from the concatenated features.

Figure 8. Structural details of the CSPOmniKernel module. It is integrated at the

P_{3}

level of the SGO-FPN to effectively extract and fuse multi-scale contextual information from the concatenated features.

Figure 9. Network architecture of the GSConvE module.

Figure 10. Statistical characteristics of the constructed PCB defect dataset: (a) The quantitative distribution of instances across the seven defect categories. (b) The scatter density plot of normalized bounding box dimensions for all annotated defects.

Figure 11. Visualization of detection results across different defect categories. (a) Scratch. (b) Excess solder. (c) Short. (d) Mouse bite. (e) Spurious copper. (f) Spur. The colored boxes represent the predicted bounding boxes, with text labels indicating the specific defect category and confidence score.

Figure 12. Trade-off between mAP@0.5 and Params in the ablation study.

Figure 13. Heatmap visualization of different model configurations. (a) original images with differently colored bounding boxes denoting defect annotations, which represent, from left to right, excess solder, mouse bite, spurious copper, scratch, open circuit, and spur. (b) Baseline RT-DETR. (c) Baseline + C2f-EC. (d) Baseline + C2f-EC + ASAFI. (e) EAS-DETR (ours). Warmer colors indicate stronger model attention.

Table 1. Experimental environment configuration.

Configuration	Description
Operating System	Ubuntu 22.04
CPU	Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz (14 vCPU)
GPU	NVIDIA GeForce RTX 3090 (24 GB)
RAM	90 GB
Python Version	3.10
Deep Learning Framework	PyTorch 2.1.0
CUDA Version	12.1

Table 2. Comparison with state-of-the-art models on the PCB dataset.

Model	P (%)	R (%)	$F_{1}$ (%)	mAP@0.5 (%)	mAP@0.5–0.95 (%)	Params (M)	GFLOPs
Faster-RCNN	91.4	84.9	87.5	89.7	42.5	41.4	134.0
YOLOv5n	88.2	79.6	83.2	84.4	41.1	2.2	5.8
YOLOv5s	90.9	86.4	88.3	90.7	42.8	7.8	18.7
YOLOv5m	91.1	87.7	89.2	91.3	43.4	22.1	52.5
YOLOv8n	86.6	79.7	82.4	83.4	40.9	2.7	6.8
YOLOv8s	87.8	86.8	87.1	89.1	42.3	9.8	23.4
YOLOv8m	92.5	88.3	90.4	92.5	43.5	23.2	67.4
YOLOv9s	92.5	89.1	89.1	90.5	41.9	6.0	22.1
YOLOv9m	92.4	89.8	91.1	92.6	43.7	17.0	60.0
YOLO11n	92.1	76.8	83.2	85.5	41.8	2.6	6.3
YOLO11s	90.3	83.8	86.5	88.7	42.7	9.4	21.3
YOLO11m	91.0	88.5	89.6	90.7	43.4	20.0	67.7
RTDETR-r18	89.6	89.1	89.3	88.1	39.7	19.9	57.0
RTDETR-r34	87.4	83.5	85.3	86.6	34.6	31.1	88.8
FFCA-YOLO	92.6	89.1	90.8	91.8	43.9	2.3	17.4
VRF-RTDETR	92.4	90.5	91.4	91.2	43.3	13.5	44.3
EAS-DETR (ours)	91.4	91.9	91.5	93.0	44.1	14.6	59.0

Bold values indicate the best performance.

Table 3. Five-fold cross-validation results of the baseline and the proposed EAS-DETR.

Model	Validation	P (%)	R (%)	$F_{1}$ (%)	mAP@0.5 (%)	mAP@0.5–0.95 (%)
RTDETR-r18	Fold 1	89.6	89.1	89.3	88.1	39.7
	Fold 2	89.1	88.4	88.7	87.5	39.2
	Fold 3	89.8	88.9	89.4	87.8	39.4
	Fold 4	89.2	88.5	88.8	87.6	39.1
	Fold 5	89.4	88.8	89.1	87.9	39.5
	Mean ± SD	89.42 ± 0.28	88.74 ± 0.29	89.06 ± 0.30	87.78 ± 0.24	39.38 ± 0.24
EAS-DETR	Fold 1	91.4	91.9	91.5	93.0	44.1
	Fold 2	91.1	91.5	91.3	92.8	43.8
	Fold 3	91.6	91.8	91.7	92.9	43.9
	Fold 4	91.2	91.4	91.2	92.7	43.7
	Fold 5	91.3	91.6	91.4	92.8	44.0
	Mean ± SD	91.32 ± 0.19	91.64 ± 0.21	91.42 ± 0.19	92.84 ± 0.11	43.90 ± 0.16

Bold values indicate the best performance.

Table 4. Generalization results of EAS-DETR across different benchmark datasets.

Datasets	Model	P (%)	R (%)	$F_{1}$ (%)	mAP@0.5 (%)	mAP@0.5–0.95 (%)
PCB	RTDETR-r18	89.6	89.1	89.3	88.1	39.7
PCB	EAS-DETR	91.4	91.9	91.5	93.0	44.1
DeepPCB	RTDETR-r18	96.4	95.6	96.0	97.9	67.1
DeepPCB	EAS-DETR	97.7	96.9	97.3	98.5	73.0
AI-TOD	RTDETR-r18	90.2	87.5	88.7	87.6	37.5
AI-TOD	EAS-DETR	93.9	93.4	93.6	93.4	49.3
VisDrone-DET	RTDETR-r18	49.5	34.9	39.6	34.9	19.1
VisDrone-DET	EAS-DETR	54.3	44.7	48.0	42.4	24.0

Bold values indicate the best performance.

Table 5. Ablation experiment results of the proposed EAS-DETR.

No.	C2f-EC	ASAFI	SGO-FPN	P (%)	R (%)	$F_{1}$ (%)	mAP@0.5 (%)	mAP@0.5–0.95 (%)	Params (M)	GFLOPs	FPS
1				89.6	89.1	89.3	88.1	39.7	19.9	57.0	86.93
2	✓			91.1	89.8	90.4	90.1	41.1	12.8	42.9	107.04
3		✓		90.9	89.8	90.3	90.6	39.3	20.7	57.8	100.80
4			✓	91.2	89.7	90.4	91.3	42.7	20.2	64.6	90.74
5	✓	✓		91.3	86.3	88.7	90.8	43.1	13.7	43.8	74.89
6	✓		✓	90.9	89.3	90.1	91.1	41.5	13.8	58.1	67.73
7		✓	✓	91.0	89.3	90.1	90.4	40.3	21.0	65.5	72.80
8	✓	✓	✓	91.4	91.9	91.5	93.0	44.1	14.6	59.0	70.05

The symbol ✓ indicates the inclusion of the corresponding module. Bold values indicate the best performance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yan, Y.; Wu, R.; Ren, J. EAS-DETR: An Enhanced Real-Time Transformer with Sparse Attention and Global Context for PCB Defect Inspection. Electronics 2026, 15, 1662. https://doi.org/10.3390/electronics15081662

AMA Style

Yan Y, Wu R, Ren J. EAS-DETR: An Enhanced Real-Time Transformer with Sparse Attention and Global Context for PCB Defect Inspection. Electronics. 2026; 15(8):1662. https://doi.org/10.3390/electronics15081662

Chicago/Turabian Style

Yan, Yuxin, Ruize Wu, and Jia Ren. 2026. "EAS-DETR: An Enhanced Real-Time Transformer with Sparse Attention and Global Context for PCB Defect Inspection" Electronics 15, no. 8: 1662. https://doi.org/10.3390/electronics15081662

APA Style

Yan, Y., Wu, R., & Ren, J. (2026). EAS-DETR: An Enhanced Real-Time Transformer with Sparse Attention and Global Context for PCB Defect Inspection. Electronics, 15(8), 1662. https://doi.org/10.3390/electronics15081662

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EAS-DETR: An Enhanced Real-Time Transformer with Sparse Attention and Global Context for PCB Defect Inspection

Abstract

1. Introduction

2. Method

2.1. RT-DETR Baseline Framework

2.2. Overview of the Proposed EAS-DETR Algorithm

2.3. C2f-EC

2.4. ASAFI

2.5. SGO-FPN

3. Experimental Results and Analysis

3.1. Dataset

3.2. Experimental Environment

3.3. Evaluation Metrics

3.4. Comparative Experiments

3.4.1. Comparison with Baseline Models

3.4.2. Generalization Performance Across Different Datasets

3.5. Ablation Experiment

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI