Research on a Multi-Type Barcode Defect Detection Model Based on Machine Vision

Duan, Ganglong; Zhang, Shaoyang; Shang, Yanying; Shao, Yongcheng; Han, Yuqi

doi:10.3390/app15158176

Open AccessArticle

Research on a Multi-Type Barcode Defect Detection Model Based on Machine Vision

by

Ganglong Duan

,

Shaoyang Zhang

^*,

Yanying Shang

,

Yongcheng Shao

and

Yuqi Han

Faculty of Economics and Management, Xi’an University of Technology, Xi’an 710048, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8176; https://doi.org/10.3390/app15158176

Submission received: 18 June 2025 / Revised: 15 July 2025 / Accepted: 18 July 2025 / Published: 23 July 2025

Download

Browse Figures

Versions Notes

Abstract

Barcodes are ubiquitous in manufacturing and logistics, but defects can reduce decoding efficiency and disrupt the supply chain. Existing studies primarily focus on a single barcode type or rely on small-scale datasets, limiting generalizability. We propose Y8-LiBAR Net, a lightweight two-stage framework for multi-type barcode defect detection. In stage 1, a YOLOv8n backbone localizes 1D and 2D barcodes in real time. In stage 2, a dual-branch network integrating ResNet50 and ViT-B/16 via hierarchical attention performs three-class classification on cropped regions of interest (ROIs): intact, defective, and non-barcode. Experiments conducted on the public BarBeR dataset, covering planar/non-planar surfaces, varying illumination, and sensor noise, show that Y8-LiBARNet achieves a detection-stage mAP@0.5 = 0.984 (1D: 0.992; 2D: 0.977) with a peak F1 score of 0.970. Subsequent defect classification attains 0.925 accuracy, 0.925 recall, and a 0.919 F1 score. Compared with single-branch baselines, our framework improves overall accuracy by 1.8–3.4% and enhances defective barcode recall by 8.9%. A Cohen’s kappa of 0.920 indicates strong label consistency and model robustness. These results demonstrate that Y8-LiBARNet delivers high-precision real-time performance, providing a practical solution for industrial barcode quality inspection.

Keywords:

computer vision; defect detection; multi-type barcodes; Y8-LiBAR Net

1. Introduction

As efficient and low-cost information encoding methods, barcodes are widely applied in various fields such as data collection, product traceability, and warehouse logistics [1]. For instance, Sanchez et al. [2] implemented barcode recognition technology in the medical field, significantly reducing medication error rates. Kubánková et al. [3] demonstrated that barcode technology enhances logistical efficiency and minimizes operational errors in enterprise management. With the acceleration of industrialization and the rise of smart manufacturing, barcode technology has become an indispensable information carrier in product and logistics management, rendering defect detection critically important. Barcode defects compromise reading efficiency, disrupt production processes, and may trigger supply chain issues along with safety incidents. Therefore, robust barcode defect recognition is vital for ensuring reliability and safety within industrial automation and smart manufacturing systems.

Barcodes are broadly categorized into one-dimensional (1D/linear) and two-dimensional (2D) types, exemplified by EAN-13 and QR codes. 1D barcodes encode information via variable-width bars and spacings, whereas 2D variants store more data using a matrix structure across horizontal and vertical axes, delivering higher capacity [4] (Figure 1 and Figure 2).

Advances in deep learning and machine vision, coupled with diverse application scenarios, offer powerful solutions to these challenges. For example, G. Duan et al. [5] proposed a hybrid human-in-the-loop model for musculoskeletal disease diagnosis using X-ray images, achieving significant improvements in diagnostic accuracy and efficiency. Likewise, deep learning has been applied to barcode detection, focusing primarily on identification and defect detection during printing processes [6]. In 2023, Jocher et al. [7] released the general-purpose YOLOv8 object detection model, which provides fast and efficient performance for barcode detection, among other computer vision tasks.

Significant progress has been made in barcode defect detection, with 25 publications from 2015 to 2021 introducing deep learning-based methods for 1D, 2D, or hybrid barcode detection [8]. However, persistent issues hinder definitive assessments of method effectiveness and applicability [9], including reliance on small datasets that inadequately represent real-world conditions, methodological restriction to single barcode types, compromising generalizability, and inconsistent evaluation metrics causing contradictory comparisons even with identical algorithms and datasets.

To address these limitations, we employ the BarBeR (Barcode Benchmark Repository) public dataset [9], released by Vezzali et al. in 2025, which encompasses multiple barcode types and standardized benchmarks. We construct a two-stage hybrid model by integrating YOLOv8n, ResNet50, and Vision Transformer (ViT-B/16) to enable rapid and accurate barcode defect detection. Our main contributions include the following:

(1) Unified Detection Process for Multi-Modal Features. This work integrates differences in the features of 1D and 2D barcodes, as well as flat and non-flat barcodes, to establish a unified detection pipeline from real-time barcode localization to precise defect classification. This pipeline integrates various barcode formats, including EAN-13, Code-128, QR, and Data Matrix, and demonstrates high robustness (mAP@0.5 = 0.984, F1 = 0.97) under challenging real-world conditions such as complex lighting, local occlusion, and curved reflection, significantly enhancing the adaptability of industrial quality inspection to heterogeneous barcodes.

(2) Proposed Lightweight Two-Stage Deep-Fusion Model—Y8-LiBAR Net.

Stage 1 rapidly localizes barcode candidates with YOLOv8n’s anchor-free decoupled head and C2f modules, keeping single-frame latency ≤ 12 ms at 640 × 640 on an RTX 4060.

Stage 2 performs fine-grained defect discrimination through a dual-branch hybrid classifier. A ResNet50 branch extracts pixel-level micro-textures (scratches, ink smears, broken edges) while a ViT-B/16 branch models long-range dependencies to capture macro anomalies (warping, missing modules, global distortions).

A shared encoder projects heterogeneous features into a unified latent space, and an orthogonal regularizer L_reg minimizes inter-branch covariance to ensure complementary coverage. Hierarchical cross-attention adaptively fuses local and global cues before a lightweight MLP outputs the class label.

This two-stage strategy decouples localization and classification, reduces background noise and computational burden, boosts defective barcode recall by 8.9% over the best single-branch baseline, trims parameters by ≈40%, and attains 45 FPS, providing an accurate yet hardware-friendly solution for online quality inspection and edge deployment.

(3) Training and Evaluation on Relatively Large-Scale Datasets. Utilizing the BarBeR public dataset, which contains 8748 images, 18 encoding formats, and various working conditions (flat, curved, reflective, etc.), the model is trained and evaluated end-to-end using standardized benchmarks. Experimental results show defect classification with an accuracy of 92.5%, a recall of 92.5%, and an F1 score of 91.9%, validating the method’s generalizability and providing a valuable reference for future research and industrial deployment.

2. Background Literature

Barcode recognition technology has achieved substantial maturity; however, in industrial applications, barcodes frequently exhibit defects caused by poor printing, environmental contamination, or physical damage, which compromise readability, information extraction accuracy, and product integrity, as well as appearance. Common defects—including damage, missing elements, wrinkles, creases, and occlusions—pose particular detection challenges due to their low-visibility characteristics, as illustrated in Figure 3.

Existing barcode quality assessment instruments (e.g., C42A, JY-3C, and AxiCon PC-7000) evaluate barcode quality metrics via optical scanning and signal conversion. Nevertheless, many inspections still require manual intervention [10]. These systems exhibit high sensitivity to environmental interference (e.g., humidity, temperature, calibration drift), rendering them inadequate for meeting real-time operation and stability requirements in industrial automation. Consequently, there is an urgent demand for developing lightweight, intelligent automated systems for barcode defect detection.

In prior research, X. Dong et al. [11] proposed a template-matching approach utilizing edge detection, which extracts edge points for comparison against predefined templates. While effective in specific character-matching scenarios, this technique exhibits high sensitivity to variations in defect morphology and scale, necessitating customized templates for different use cases that complicate deployment and limit generalizability. To address pose variation challenges, H. Xu et al. [12] introduced a print-quality detection framework based on multi-target matching and character feature fusion, enhancing robustness to deformed characters. Nevertheless, this approach still requires extensive labeled data and remains sensitive to print-quality fluctuations, restricting practical applicability.

Advancements in neural networks have catalyzed novel defect detection methodologies. Over the past three years, hybrid CNN–ViT architectures have achieved remarkable success in industrial visual-defect inspection. Wang et al. proposed Defect Transformer (DefT), whose convolution–self-attention encoder–decoder performs multi-scale surface-defect detection and sets new records on three public benchmarks [13]. Gao et al. introduced IH-ViT, which fuses convolutional features with ViT global relations and raises accuracy in IC-surface defect recognition by more than 6% [14]. Jeong et al. developed Hybrid-DC, where a ResNet50 + ViT attention fusion boosts strip-steel defect-classification accuracy to 97.8% [15]. In the barcode domain, Chen et al. incorporated an attention mechanism into YOLOv8-QR, increasing mAP by 8.3% for printed QR-code defect inspection [16]. In practical barcode defect detection, Qi W. et al. [17] combined SVM with grayscale projection to identify defects and evaluate character quality, achieving exceptional performance on non-planar media (e.g., glass bottles) with 98.3% accuracy in detecting common flaws such as ink smears, misprints, and spots, validating traditional machine learning for lightweight tasks.

Recent deep learning advances have accelerated progress in barcode defect detection. Do and Kim [18] integrated multi-digit recognition into a single-stage detector, enabling real-time processing of structured objects like barcodes. Hansen et al. [19] applied YOLOv4 to 1D barcode detection, facilitating real-time localization and decoding. Kamnardsiri et al. [20] curated a benchmark dataset for 1D barcodes and comprehensively compared CNN architectures under varied complexity scenarios, providing optimization guidelines. However, these methods suffer from limited dataset diversity, restricted barcode type coverage, and inadequate generalizability.

Zharkov et al. [21] designed a universal barcode detector using semantic segmentation, efficiently handling both 1D and 2D barcodes. The model maintained high accuracy and real-time performance on the ArTe-Lab 1D dataset, enhancing methodological versatility and engineering utility. Nonetheless, existing solutions remain constrained to niche datasets and scenarios, hindering adaptation to diverse real-world demands.

Faster R-CNN has been extensively adopted for barcode detection. Jia et al. [22] enhanced their Region Proposal Network (RPN) via oriented anchors, improving robustness to distorted barcodes and localization precision. Zhang et al. [23] optimized fully connected layers in Fast-RCNN and introduced quadrilateral vertex-based bounding box regression, further increasing detection accuracy and adaptability. In lightweight research, Zhang D. et al. [24] proposed an efficient CNN for industrial surface defect inspection, reducing computation while preserving accuracy. Their subsequent lightweight deep CNN algorithm for SDD achieved superior speed and precision with minimal parameters, demonstrating strong generalization [25]. Shen et al. [26] developed L-Net, a computation-optimized CNN for edge devices, contributing theoretical insights to lightweight learning. Despite promising results, these approaches struggle with model transferability and dataset scope limitations.

In summary, while substantial progress has been made, persistent challenges remain, including small-scale datasets predominantly covering single barcode types, insufficient model generalizability across diverse scenarios, and significant technical barriers in practical defect inspection. Consequently, lightweight, efficient, and universal solutions are imperative. To address these needs, we introduce Y8-LiBARNet, which diverges from prior dual-branch/two-stage frameworks through the following:

(1) A hierarchical attention fusion module that dynamically learns—rather than statically combines—the relevance of ResNet-derived local textures and ViT-based global contexts;

(2) An end-to-end lightweight pipeline (YOLOv8n → coordinate mapping → ROI cropping) sharing unified data flows between localization and defect recognition;

(3) C2f lightweighting and cross-stage parameter sharing, reducing total parameters by ≈40% versus contemporaries (e.g., Hybrid-DC, DefT).

3. Model Construction

To achieve efficient detection of multi-type barcode defects, this paper designs a two-stage model incorporating an attention mechanism. The overall framework consists of a lightweight YOLOv8n localization module, parallel ResNet50 convolutional branch and ViT-B/16 global branch, as well as hierarchical attention fusion and a three-class decision layer, aiming to balance real-time performance, lightweight design, and defect classification accuracy. The functionalities and interconnections of each submodule are described as follows.

3.1. Barcode Localization Submodule-YOLOv8n

In the multi-type barcode defect detection framework proposed in this paper, YOLOv8n serves as the first-stage localization network, responsible for rapid localization and category discrimination. Its output directly determines the effectiveness of subsequent feature extraction and defect classification processes.

YOLOv8n follows the Backbone–Neck–Head three-tier hierarchical structure, but integrates the latest components, such as Cross-Stage-Full (C2F) and Separable Dilated Pyramid Pooling (SPPFK-k), for lightweight design. The overall structure of YOLOv8n is shown in Figure 4 and can be subdivided into the following modules.

(1) Backbone: The backbone primarily consists of alternating stacks of CBS (Conv-BN-SiLU), C2F, and Bottleneck units. Multiple layers of C2F achieve feature reuse through cross-stage residuals and grouped convolutions, significantly reducing parameter scale. The Bottleneck unit constructs shortcut paths between 1 × 1 and 3 × 3 convolutions to mitigate gradient vanishing. The Backbone outputs feature maps P8, P16, and P32 at three scales (1/8, 1/16, and 1/32 of the original image size H × W).

(2) Neck: The neck uses an improved FPN+PAN structure, with the addition of SPPFK-5 (Separable Dilated Spatial Pyramid Pooling) after P32. It employs convolutions with different dilation rates

\{k = 5, 9, 13\}

to enhance the receptive field, thus capturing the geometric features of both narrow 1D barcodes and near-square 2D barcodes.

(3) Head: The head implements an anchor-free decoupled detection head, separating the bounding box regression and classification branches. The output scales are

\{S = 8, 16, 32\}

, and for any scale

s \in S

, the feature map dimensions are

H_{s} = H / S

,

W_{s} = w / S

. The output tensor for a single scale is defined as:

O^{(s)} \in R^{H_{S} \times W_{s} \times (4 + 1 + C)}

(1)

where “4” represents the bounding box regression values

(\hat{x}, \hat{y}, \hat{w}, \hat{h})

, “1” denotes the object confidence, and C = 2 corresponds to the number of barcode categories (1D and 2D). The decoupled design optimizes the regression and classification branches independently, reducing gradient coupling and improving convergence stability.

The core mechanism is modeling barcode detection as a multi-task learning problem, with the optimization objective function:

L = λ_{c l s} B C E (p, p^{*}) + λ_{o b j} B C E (o, o^{*}) + λ_{b o x} C I o U (b, b^{*})

(2)

where BCE (·) represents binary cross-entropy, and (p, o, b) correspond to class probabilities, object confidence, and bounding box predictions, respectively, with “*” denoting the ground truth labels. The bounding box regression uses Complete-IoU:

C I o U = 1 - I o U + \frac{{∥ c - c^{*} ∥}^{2}}{d_{m a x}^{2}} + α v

(3)

where c and c* are the predicted and ground truth box centers,

d_{m a x}

is the diagonal length of the bounding boxes, and v measures aspect ratio consistency, serving as a weighting coefficient. The anchor-free scheme directly regresses the box offset from the grid center, eliminating the need for predefined prior box sets and simplifying the unified modeling of 1D objects with extreme aspect ratios and 2D near-square objects.

YOLOv8n plays a critical role in the framework by handling localization and cropping. After optimization with Equation (2), the network infers the input image I, producing a set of bounding boxes:

B = {\{(b_{i}, c_{i})\}}_{i = 1}^{N}, c_{i} \in \{0, 1\}, b_{i} = (x_{i} y_{i} w_{i} h_{i})

(4)

The coordinate components

(x_{i} y_{i} w_{i} h_{i}) \in [0, 1]

are relative values. To obtain the barcode ROI, the relative coordinates are first mapped to the pixel domain:

x_{1 i} = (x_{i} - \frac{w_{i}}{2}) W, x_{2 i} = (x_{i} - \frac{w_{i}}{2}) W, y_{1 i} = (y_{i} - \frac{h_{i}}{2}) H, y_{2 i} = (y_{i} - \frac{h_{i}}{2}) H

(5)

The original image I is then cropped into barcode subimages.

I_{i} = I [y_{1 i} : y_{2 i}, x_{1 i} : x_{2 i}]

. To ensure consistency in input for the subsequent ResNet50 and ViT-B/16 branches, all ROIs are further resampled to 224 × 224 using bicubic interpolation and written according to

c_{i}

. The pipeline structure of “Object Detection—Coordinate Transformation- Cropping—Scale Normalization” eliminates redundant background and false detection noise, allowing the subsequent feature extractor to focus on the barcode texture itself, thereby providing accurate and clean priors for the hierarchical attention classifier.

3.2. Feature Representation Submodule-ResNet50

After completing barcode instance-level object detection, ROI cropping, and size normalization, this paper uses a parallel two-branch method to detect barcode defects. In the ResNet50 branch, convolutional feature extraction is performed on each 224

\times

224 barcode subimage. ResNet50 belongs to the Deep Residual Network (DRN) family, and its core idea is to establish a residual connection between the convolutional mapping \mathcal{F}(x)and the identity mapping X, thus alleviating the gradient degradation problem that arises with increasing depth. For any residual unit, the output can be formalized as:

y = F (x; Θ) + X

(6)

where

x, y \in R^{C \times H \times W}

are the input and output feature tensors,

F (\cdot)

represents the combination of two Conv-BN-ReLU convolutional mappings, and

Θ

are the learnable weights. When the input and output dimensions do not match, a 1 × 1 convolution is introduced to align the dimensions.

ResNet50 consists of a Stem (Conv-BN-ReLU-Maxpool) and four residual stages

{\{{C o n v}_{x}^{2}\}}_{x = 2}^{5}

, where each stage consists of a Conv Block and several Identity Blocks, resulting in a total of 49 convolutional layers and one fully connected layer, as shown in Figure 5. Through convolution and max-pooling, the spatial resolution of the feature maps is progressively compressed to 1/32 of the original size, while the channel dimension is expanded to 2048. Subsequently, the global average pooling (GAP) layer aggregates the spatial information into a one-dimensional vector.

f_{r e s} \in R^{2048}

(7)

To align with the feature dimension of the parallel Vision Transformer branch, the original classification layer is removed, and a linear projection is added:

z_{r e s} = W_{r e s} f_{r e s} + b_{r e s}, W_{r e s} \in R^{1024 \times 2048}

(8)

This produces the unified dimensional representation vector

z_{r e s} \in R^{d}

. After concatenating with the output

z_{v i t}

from the Vision Transformer, both are involved in the subsequent hierarchical attention fusion.

In implementation, ResNet50 is initialized with ImageNet pre-trained weights. To balance convergence speed and fine-tuning ability, the backbone convolution layers can be either “frozen” or “unfrozen” depending on experimental needs, while the linear projection

W_{r e s}

always participates in gradient updates.

By utilizing the deep residual structure to perform multi-scale encoding of the barcode ROI’s texture, edges, and local defects, the output semantic vector retains the advantages of convolutional networks in local pattern recognition while providing high-dimensional, distinguishable input features for the cross-modal attention mechanism.

3.3. Global Semantic Submodule-ViT-B/16

After the convolutional branch extracts local texture, to complement the long-range dependencies and global structural information in the barcode image, this paper introduces ViT-B/16 as the second feature pathway, as shown in Figure 6. The network first divides the input ROI (

I \in R^{3 \times 224 \times 224}

) into 16 × 16 non-overlapping image patches and applies linear embedding

E \in R^{768 \times 768}

to each patch

X_{p} \in R^{3 \times 16 \times 16}

, resulting in a patch token sequence

{\{e_{p}\}}_{p = 1}^{196}

. Then, a learnable classification token

t_{c l s}

is concatenated at the beginning of the sequence, and positional encodings P are added to preserve the relative positional information between patches, forming the input matrix:

Z_{0} = {[t}_{c l s}; e_{1}; \dots; e_{196}] + P, Z_{0} \in R^{197 \times 768}

(9)

This sequence then passes through L = 12 Transformer Encoder layers. For the l-th layer, multi-head scaled dot-product attention is first computed:

A t t e n t i o n (Q, K, V) = S o f t m a x (\frac{{Q K}^{T}}{\sqrt{d_{k}}}) V

(10)

where

Q = Z_{l - 1} W_{Q}, K = Z_{l - 1} W_{K}, V = Z_{l - 1}

. The result is then updated through a residual connection and a feed-forward network to complete a feature update

Z_{l}

. The final output

Z_{l}

has its first row (the CLS-token) denoted as

f_{v i t} \in R^{768}

, which contains the global context semantics of the entire barcode image. To align with the convolutional branch in terms of feature dimension, a linear mapping is applied to

f_{v i t}

, resulting in a unified 1024-dimensional vector

Z_{v i t}

:

Z_{v i t} = W_{v i t} f_{v i t} + b_{v i t}, W_{v i t} \in R^{1024 \times 768}

(11)

By using global self-attention, ViT-B/16 models cross-patch dependencies for the barcode ROI. The contextual semantic vector generated by ViT-B/16 not only complements the convolutional branch’s shortcomings in capturing long-range relationships but also provides a globally consistent and discriminative high-dimensional feature representation for the hierarchical attention fusion module, thereby improving the overall model’s defect classification robustness and generalization ability.

3.4. Lightweight Two-Stage Fusion Model-Y8-LiBAR Net

The end-to-end barcode defect detection system developed in this study consists of four sequential stages: object detection-cropping, dual-branch feature extraction, hierarchical attention fusion, and three-class decision making. The original image is input into the lightweight detector YOLOv8n, which locates the barcode and generates bounding boxes with millisecond-level speed. The area within the bounding box is then cropped and normalized, forming a new dataset containing only the barcode, which significantly reduces the interference from complex backgrounds for subsequent analysis.

To adaptively balance the importance of the two types of information, a hierarchical attention fusion module is introduced at the feature level. First, the two vectors are concatenated and input into a lightweight perceptron to obtain normalized weights

α, β \in (0, 1)

, then a fused vector is generated through a weighted sum:

f = α Z_{r e s} + β Z_{v i t}, α + β = 1

(12)

Hierarchical Attention Fusion (HAF).To obtain interpretable and adaptive weights for the two feature streams, HAF adopts a two-level gating strategy.

(i) Channel-level gating. The ResNet feature vector

Z_{r e s} \in R^{1024}

and the ViT feature vector

Z_{v i t} \in R^{1024}

are first concatenated to form

h = [Z_{r e s}; Z_{v i t}] \in R^{2048}

This joint descriptor passes through a bottleneck MLP (r = 4) to capture cross-modal interactions:

h^{'} = R e L U (w_{1} h + b_{1}), [\binom{α}{β}] = S o f t m a x (w_{2} h^{'} + b_{2})

(12a)

where

α, β \in (0, 1)

and

α + β = 1

.

(ii) Task-level fusion. The normalized weights modulate the two branches to generate the fused representation:

f = α Z_{r e s} + β Z_{v i t}

(12b)

This hierarchy—MLP gating followed by weighted summation—enables the network to emphasize texture-rich cues when surface damage is subtle (higher α) and to favor global layout cues under severe distortions (higher β). The fused vector f is finally projected to the three-class logits via the existing classification head (13).

This strategy not only avoids the dimensional expansion caused by simple concatenation but also allows the model to dynamically adjust the balance between local and global features for different scenarios. Finally, the fused vector is mapped to a three-dimensional probability space via a Softmax classification head, where the output

\hat{y}

corresponds to the three classes: “non-barcode”, “normal barcode”, and “defective barcode”:

\hat{y} = S o f t m a x (w f + b), \hat{y} \in R^{3}

(13)

The network is trained with a composite loss

L_{t o t a l} = L_{C E} + {λ L}_{r e g}

, where

L_{C E} = - \sum_{i = 1}^{N} \sum_{c = 0}^{2} l o g p (y_{i} = c| x_{i})

and

L_{r e g} = {‖F_{R e s} F_{V i T}^{T}‖}_{F}^{2}

; a grid search

(λ = 0.01, 0.05, 0.1, 0.2)

indicates

λ = 0.1

achieves the best trade-off, and this value is therefore adopted in all experiments. This hybrid architecture fully leverages the advantages of each submodule: YOLOv8n provides millisecond-level localization and background removal; ResNet50 retains the sensitivity of convolutional networks to local textures and gaps; and ViT-B/16 compensates for convolution’s limitations in modeling long-range dependencies. Hierarchical attention, through an interpretable weight allocation mechanism, enables the model to adaptively balance local and global information for different scenarios, preventing representation redundancy and overfitting caused by simple concatenation. The specific model structure is shown in Figure 7.

4. Experiment and Results Analysis

4.1. Dataset Introduction

This study employs the BarBeR dataset released in 2025, which provides a benchmark for testing and comparing barcode-detection algorithms. To demonstrate that our work overcomes the limited-data constraint observed in earlier studies, we compare BarBeR with several representative public barcode-detection benchmarks along four dimensions—number of images, resolution range, volume of annotations, and supported symbologies. The results, summarized in Table 1, show that BarBeR markedly surpasses existing alternatives in both scale and coverage.

BarBeR is a composite collection built from twelve publicly available datasets [18,20,21,27,28,29,30,31,32,33,34], yielding 8062 one-dimensional and 1756 two-dimensional barcodes. It incorporates, among others, Are-Lab Medium 1D, Are-Lab Extended 1D, and Bodnar-Huawei, as illustrated in Figure 8 and Figure 9. Most barcodes were annotated with Datalogic, yet a minority (1722) could not be decoded owing to blur, noise, or improper scaling. Because the annotations were produced manually, some metadata—such as PPE—is missing. The use of polygonal rather than rectangular masks makes the labels suitable for both detection and segmentation tasks.

The dataset exhibits extensive thematic and environmental diversity, encompassing 18 barcode symbologies. Fourteen are classified as one-dimensional (Code 128, Code 39, EAN-2, EAN-8, EAN-13, GS1-128, IATA 2-of-5, Intelligent Mail Barcode, Interleaved 2-of-5, Japan Post, KIX-code, RostNet, Royal Mail Code, and UPC), while four are two-dimensional (Aztec, Data Matrix, PDF 417, and QR Code). Variation in barcode type, capture angle, illumination, and background clutter provides a robust testbed for subsequent defect-detection research.

4.2. Object Detection Experiment

To validate the effectiveness of the barcode object detection scheme proposed in this paper, this section systematically evaluates the fine-tuned YOLOv8n on the BarBeR dataset test set and compares it with various mainstream detectors. The evaluation metrics include Precision, Recall, and mAP@0.5 m. During the training phase, only the two output channels of the Head are retained to adapt to the 1D and 2D barcode categories, while the rest of the network structure maintains the default settings of the official YOLOv8n. The training uses a batch size of 32,100 epochs, the Cosine Decay learning rate strategy, and enables Mosaic and HSV-Aug data augmentation.

(1): Detection Curve Analysis

To comprehensively evaluate the model’s performance at different detection stages, this paper plots four types of metric curves, including the Precision–Recall curve, F1–Confidence curve, Recall–Confidence curve, and Precision–Confidence curve. The results are shown in Figure 10. These curves form a progressive evaluation system from the perspectives of overall accuracy, optimal threshold, recall stability, and false detection control.

Starting from the overall detection effect, Figure a shows the change in accuracy at different recall rates, i.e., the Precision–Recall curve. It can be observed that the curve for 1D barcodes is almost close to the upper axis, with mAP@0.5 reaching 0.992, while for 2D barcodes, it is slightly lower at 0.977. The average mAP@0.5 for the overall categories is 0.984, indicating that the model achieves excellent detection accuracy while maintaining strong recall capability, demonstrating good overall detection quality.

After understanding the model’s overall performance, the next step is to determine the optimal confidence threshold for practical deployment. Figure b shows the variation curve of the F1 score with different confidence values. It can be seen that when the confidence is 0.723, the F1 value reaches its maximum value of 0.97, indicating that Precision and Recall achieve an optimal balance, making this an ideal choice for the default threshold in subsequent inference.

On this basis, Figure c shows the trend of Recall with varying confidence levels, used to measure the model’s detection ability. From the graph, it is evident that for confidence values between 0 and 0.8, Recall remains above 0.93, indicating that the model can still stably detect most targets at low to medium thresholds, demonstrating strong recall robustness.

Finally, Figure d presents the variation of Precision with confidence. As the confidence increases, Precision monotonically increases and approaches 1.00 at 0.986. This indicates that the model has almost no false detections under high confidence conditions, which is suitable for detection scenarios where a low tolerance for misclassification is required.

In summary, the four detection metric curves provide a systematic evaluation of the model from overall detection quality, optimal threshold selection, recall stability, and false detection control. The results show that the constructed YOLOv8n detection model not only achieves high Precision and Recall but also exhibits good threshold adaptability, providing stable and reliable candidate region support for subsequent defect recognition.

(2): Confusion Matrix Analysis

To further validate the model’s classification performance on different categories of targets, this paper constructs a confusion matrix based on the inference results from the test set, as shown in Figure 11. In the matrix, TP, TN, FP, and FN represent the classification results for each model: True Positives, True Negatives, False Positives, and False Negatives. From this, Precision, Recall, and F1 score for the two categories can be calculated, with the equations related to the confusion matrix as follows:

R e c a l l = \frac{T P}{T P + F N}, P r e c i s i o n = \frac{T P}{T P + F P}, F 1 - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(14)

1D Barcode Classification Results: The model correctly identified 2403 1D barcodes as 1D, misclassified 3 as 2D, missed 11 as background, and misclassified 54 background objects as 1D.

{P r e c i s i o n}_{1 D} \approx 0.978

,

{R e c a l l}_{1 D} \approx 0.994

,

{F 1 - s c o r e}_{1 D} \approx 0.986

.

2D Barcode Classification Results: The model correctly identified 465 2D barcodes as 2D, misclassified 22 as 1D, missed 19 as background, and no false positives for 2D barcodes in the background are shown in the figure.

{P r e c i s i o n}_{2 D} \approx 0.955

,

{R e c a l l}_{2 D} \approx 0.919

,

{F 1 - s c o r e}_{2 D} \approx 0.937

.

From the results, it can be seen that the model achieves a Precision and Recall greater than 0.97 for 1D barcodes, with an F1 score of 0.9861, which is nearly saturated. Although the performance for 2D barcodes is slightly lower, it remains at a high level, with an F1 score of 0.9363. As discussed earlier in the curve analysis, the slightly lower performance for 2D barcodes is mainly due to factors such as small target size and strong reflective interference, which lead to some missed detections or misclassifications. Overall, the model demonstrates stable and efficient recognition ability for both types of barcodes.

To compare the proposed detector with established baselines, we reproduced the models of Zharkov et al. [21], Faster R-CNN [35], RetinaNet [36], YOLO-Nano and YOLO-Medium [7], and RT-DETR [37] using the authors’ released code and hyperparameters, then retrained each model on the same training split of the BarBeR dataset. Evaluation was performed on the held-out test split, in which every image contains at least one barcode of the relevant type. A detection was counted as correct when it satisfied an intersection-over-union (IoU) threshold of 0.5. Instance-level Precision, Recall, and F1 score obtained under these conditions are reported in Table 2 and Table 3 for one-dimensional (1D) and two-dimensional (2D) barcodes, respectively.

For 1D barcodes, Y8-LiBAR-1D achieved an F1 score of 0.986, only 0.003 (0.3 percentage points) lower than the best-performing RT-DETR. For 2D barcodes, Y8-LiBAR-2D obtained the highest F1 score of 0.937, edging the second-ranked Faster R-CNN by 0.001 (0.1 percentage points). These results show that the proposed two-stage pipeline remains competitive with the strongest existing detectors across both barcode formats.

4.3. Defect Detection Experiment

After completing barcode object detection and obtaining precise bounding boxes, this paper manually annotates all candidate boxes. The annotation work was carried out by two industry experts with over five years of experience in barcode defect quality inspection. They first independently marked the samples and then merged the annotations based on Cohen’s κ coefficient ≥ 0.92 to ensure label consistency. The final dataset contains three categories: background (class 0), normal barcode (class 1), and defective barcode (class 2). The constructed dataset was randomly divided into training and validation sets in an 8:2 ratio, ensuring that the class distribution in the validation set remained consistent with that in the training set, thereby reducing evaluation bias.

To validate the effectiveness of the proposed Y8-LiBAR Net dual-branch fusion network, three network structures were built for comparison experiments. 1. The complete model, Y8-LiBAR Net, which uses convolutional neural networks to capture local texture features and Transformer to obtain long-range dependency information. 2. ResNet-only, where the ViT branch is removed, leaving only the convolutional backbone. 3. ViT-only, which uses only the pure Transformer feature extractor.

All three models were trained using weighted cross-entropy with label smoothing (

ε = 0.1

), where the class weights are inversely proportional to the sample frequency to further alleviate the imbalance caused by the long-tailed distribution. The remaining hyperparameters were kept consistent with those used in the object detection stage to ensure the comparability of the experiments.

The results of the ablation experiment are shown in the table below. Table 4 summarizes the ablation study used to quantify the contribution of each architectural component of Y8-LiBAR Net. The full model attains the best overall performance (Accuracy = 0.9250, F1 = 0.9187). Removing either feature-extraction branch degrades the results most sharply. Without the ResNet branch, the accuracy falls by 6.2 pp, and the defect-specific Recall

{R c a l l}_{D}

drops by 0.243, confirming the critical role of local texture cues. Without the ViT branch, the accuracy decreases by 4.3 pp and

{R c a l l}_{D}

by 0.176, highlighting the importance of global context.

Eliminating the shared encoder yields a smaller yet noticeable decline (1.8 pp accuracy), indicating that jointly optimizing the two branches promotes complementary feature learning. Finally, suppressing orthogonal regularization still lowers every metric (0.8 pp accuracy), showing that decorrelating the branch features contributes measurable gains even though its effect is less pronounced than that of either branch itself.

Overall, the ablation confirms that (i) the dual-branch design is synergistic—both local and global representations are required for top performance—and (ii) the shared encoder and orthogonal constraint provide additional but secondary improvements to robustness and defect-level recall.

Beyond the ablation study, we benchmarked the complete Y8-LiBAR Net against three widely used defect-inspection baselines—SVM with a radial-basis-function kernel, DenseNet-121, and ViT-B/16—each reproduced under identical training and testing protocols on the BarBeR defect subset. All methods were evaluated on the common validation split, and detections were verified with the same IoU threshold of 0.5 before classification metrics were calculated. Table 5 presents the resulting Precision, Recall, and F1 score.

Y8-LiBAR Net achieved an F1 score of 0.922, exceeding ViT-B/16 by 0.012 (1.2 percentage points) and the classical SVM-RBF by 0.046 (4.6 percentage points) while preserving balanced Precision and Recall. This improvement indicates that the proposed dual-branch feature-fusion strategy effectively captures both global semantic structure and local texture cues essential for reliable defect recognition.

To further explain the model’s discriminative ability for each category, a row-normalized confusion matrix for the validation set was plotted, as shown in Figure 12. From the figure, we can observe the following:

1. Non-barcode samples were all correctly rejected, demonstrating strong robustness against background interference.

2. The recall rate for defective barcodes significantly increased to 76.7%, with the remaining approximately 23% of missed samples mainly concentrated in low-contrast defect scenarios, such as light scratches, minor oil stains, and edge damage. This indicates that future work can focus on hard example mining and cross-scale attention mechanisms for further optimization.

The Y8-LiBAR Net dual-branch fusion network significantly enhanced the detection ability for key defects while maintaining an overall accuracy above 92%, proving the complementarity of convolutional local features and Transformer global dependencies. Future work will consider incorporating lightweight Transformers and knowledge distillation techniques to reduce inference latency and perform imbalanced learning and data augmentation for extreme defect samples, further improving the reliability and real-time performance of industrial barcode defect detection systems.

4.4. Failure Case Analysis

Figure 13a shows a longitudinal specular highlight generated by intense directional lighting. The resulting abrupt change in local grey-level distribution and spectral characteristics misleads the network into falsely labeling normal stripes as “scratch/break” defects. Figure 13b illustrates how the class-confidence overlay applied during object detection partially occludes a 2D barcode. The defect-discrimination module consequently misclassifies the occluded area as a “missing/damaged” anomaly.

These cases reveal two weaknesses: (i) high sensitivity to non-structural interference (specular glare, visual occlusion) and (ii) insufficient coverage of such conditions in the current training set. To address these limitations, we plan the following solutions:

—Generate large-scale hard negative samples with physics-based rendering and targeted data augmentation that add synthetic glare and partial occlusion, improving robustness to illumination variation and occlusion.

—Fully decouple the visualization layer from the inference pipeline to ensure the defect-discrimination module processes only clean regions of interest.

—Introduce illumination-invariant features via masked-auto-encoder pre-training, combined with occlusion-aware attention and uncertainty estimation for adaptive correction.

—Apply dynamic structural verification in highlight/occluded regions, exploiting barcode redundancy and completeness constraints to further enhance reliability and interpretability.

5. Conclusions

This paper proposes the lightweight two-stage defect detection framework, Y8-LiBAR Net, tailored to the online quality inspection requirements for multi-type barcodes in industrial environments. The framework is systematically validated using the publicly available large-scale BarBeR dataset. Experimental results show that, in the barcode localization stage, the YOLOv8n submodule achieves mAP@0.5 = 0.984, with 1D barcodes at 0.992 and 2D barcodes at 0.977. The F1 peak value is 0.970, demonstrating high Precision and Recall performance. In the defect classification stage, the dual-branch ResNet-ViT network achieves 92.5% accuracy, 91.9% F1 score, and a defect barcode recall rate of 76.7%, an 8.9% improvement over the single-branch baseline. The overall framework achieves an inference speed of 45 FPS.

However, failure-case analysis shows two weaknesses, including (i) high sensitivity to non-structural interference (specular glare, partial occlusion) and (ii) insufficient training coverage of such conditions. To address these limitations, we will pursue the following three targeted research lines:

Generate large-scale hard-negative samples via physics-based rendering and data augmentation that introduce synthetic glare and occlusion, improving robustness to illumination variation and blockage;

Fully decouple the visualization layer from the inference pipeline so the defect-classification module processes only clean regions of interest;

Incorporate illumination-invariant features by masked-auto-encoder pre-training, combined with occlusion-aware attention and uncertainty estimation, and explore lightweight Transformer distillation to retain real-time speed on resource-constrained devices.

Author Contributions

Conceptualization, S.Z. and G.D.; Methodology, S.Z.; Software, S.Z.; Validation, G.D., Y.H., Y.S. (Yanying Shang) and Y.S. (Yongcheng Shao); Formal analysis, S.Z. and Y.S. (Yanying Shang); Investigation, Y.S. (Yongcheng Shao); Resources, Y.S. (Yanying Shang); Data curation, S.Z. and G.D.; Writing—original draft, S.Z.; Writing—review and editing, G.D.; Visualization, Y.S. (Yongcheng Shao); Supervision, G.D.; Project administration, S.Z.; Funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the “School–Enterprise Collaborative Innovation Fund for graduate students of Xi’an University of Technology” under project “Defect and Exception Detection and Information Verification of Product Packaging and Barcodes Based on Computer Vision” (No. 105-252062401).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are openly available in the BarBeR barcode benchmark repository at https://github.com/Henvezz95/BarBeR.git (accessed on 20 June 2025).

Conflicts of Interest

The authors declare no conflict of interest.

References

Weng, D.; Yang, L. Design and implementation of barcode management information system. In Proceedings of the Information Engineering and Applications: International Conference on Information Engineering and Applications, Chongqing, China, 26–28 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 1200–1207. [Google Scholar]
Sanchez, N.C. Impacting Patient Identification Barcode Medication Scanning in Ambulatory Care. Ph.D. Theses, Aspen University, Denver, CO, USA, 2025. [Google Scholar]
Kubánová, J.; Kubasáková, I.; Culik, K.; Stitik, L. Implementation of barcode technology to logistics processes of a company. Sustainability 2022, 14, 790. [Google Scholar] [CrossRef]
Taveerad, N.; Vongpradhip, S. Development of color QR code for increasing capacity. In Proceedings of the 2015 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Bangkok, Thailand, 27 November 2015; pp. 645–648. [Google Scholar]
Duan, G.; Zhang, S.; Shang, Y.; Kong, W. Research on X-ray Diagnosis Model of Musculoskeletal Diseases Based on Deep Learning. Appl. Sci. 2024, 14, 3451. [Google Scholar] [CrossRef]
Huang, Y.; Zhao, S. Automatic recognition and inspection method of 128 bar code. Coal Technol. 2011, 30, 162–164. [Google Scholar]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8 (Version 8.0.0) [Computer Software]. Ultralytics. Available online: https://github.com/ultralytics/ultralytics (accessed on 23 October 2024).
Wudhikarn, R.; Charoenkwan, P.; Malang, K. Deep learning in barcode recognition: A systematic literature review. IEEE Access 2022, 10, 8049–8072. [Google Scholar] [CrossRef]
Vezzali, E.; Bolelli, F.; Santi, S.; Grana, C. Barber: A barcode benchmarking repository. In Proceedings of the International Conference on Pattern Recognition, Kolkata, India, 1–5 December 2024; Springer: Cham, Switzerland, 2025; pp. 187–203. [Google Scholar]
Wei, S. Application of bar code verifier in testing bar code quality. China Meas. Test 2011, 37, 38–40. [Google Scholar]
Dong, X. Study on semiconductor surface defect detection based on machine vision. Metrol. Meas. Technol. 2014, 34, 4. [Google Scholar] [CrossRef]
Xu, H.; Liu, X. Printing quality inspection method based on multi-object matching and fusion character features. Packag. Eng. 2019, 40, 188–193. [Google Scholar]
Wang, J.; Xu, G.; Yan, F.; Wang, J.; Wang, Z. Defect transformer: An efficient hybrid transformer architecture for surface defect detection. Measurement 2023, 211, 112614. [Google Scholar] [CrossRef]
Wang, X.; Gao, S.; Zou, Y.; Guo, J.; Wang, C. IH-ViT: Vision Transformer-based Integrated Circuit Appear-ance Defect Detection. arXiv 2023, arXiv:2302.04521. [Google Scholar]
Jeong, M.; Yang, M.; Jeong, J. Hybrid-DC: A Hybrid Framework Using ResNet50 and Vision Transformer for Steel Surface Defect Classification in the Rolling Process. Electronics 2024, 13, 4467. [Google Scholar] [CrossRef]
Zhao, L.; Liu, J.; Ren, Y.; Lin, C.; Liu, J.; Abbas, Z.; Islam, S.; Xiao, G. YOLOv8-QR: An improved YOLOv8 model via attention mechanism for object detection of QR code defects. Comput. Electr. Eng. 2024, 118, 109376. [Google Scholar] [CrossRef]
Qi, W.; Liu, J.; Li, Y.; Liu, J. Research on defect detection in printing barcode area based on machine learning. In Proceedings of the 2024 20th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Zhongshan, China, 27–29 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–7. [Google Scholar]
Do, T.; Kim, D. Quick browser: A unified model to detect and read simple objects in real-time. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Virtual Conference, 18–22 July 2021; pp. 1–8. [Google Scholar]
Hansen, D.K.; Nasrollahi, B.; Moeslund, T.B. Real-time barcode detection and classification using deep learning. In Proceedings of the 9th International Joint Conference Computational Intelligence, Funchal, Portugal, 1–3 November 2017; pp. 321–327. [Google Scholar]
Kamnardsiri, T.; Charoenkwan, P.; Malang, C.; Wudhikarn, R. 1D barcode detection: Novel benchmark datasets and comprehensive comparison of deep convolutional neural network approaches. Sensors 2022, 22, 8788. [Google Scholar] [CrossRef] [PubMed]
Zharkov, A.; Zagaynov, I. Universal barcode detector via semantic segmentation. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20–25 September 2019; pp. 837–843. [Google Scholar]
Jia, J.; Zhai, G.; Zhang, J. EMBDN: An efficient multiclass barcode detection network for complicated environments. IEEE Internet Things J. 2019, 6, 9919–9933. [Google Scholar] [CrossRef]
Zhang, Z.; Min, X.; Wang, J. Fine localization and distortion resistant detection of multi-class barcode in complex environments. Multimed. Tools Appl. 2020, 80, 16153–16172. [Google Scholar] [CrossRef]
Zhang, D.; Hao, X.; Wang, D.; Qin, C.; Zhao, B.; Liang, L.; Liu, W. An efficient lightweight convolutional neural network for industrial surface defect detection. Artif. Intell. Rev. 2023, 56, 10651–10677. [Google Scholar] [CrossRef]
Zhang, D.; Hao, X.; Liang, L.; Liu, W.; Qin, C. A novel deep convolutional neural network algorithm for surface defect detection. Comput. Des. Eng. 2022, 9, 1616–1632. [Google Scholar] [CrossRef]
Shen, H.; Wang, Z.; Zhang, J.; Zhang, L. L-Net: A lightweight convolutional neural network for devices with low computing power. Inf. Sci. 2024, 660, 120131. [Google Scholar] [CrossRef]
Zamberletti, A.; Gallo, I.; Carullo, M.; Binaghi, E. Neural image restoration for decoding 1-D barcodes using common camera phones. In Proceedings of the VISAPP (1), Angers, France, 17–21 May 2010; pp. 5–11. [Google Scholar]
Zamberletti, A.; Gallo, I.; Albertini, S. Robust angle invariant 1D barcode detection. In Proceedings of the 2013 2nd IAPR Asian Conference. Pattern Recognit (ACPR), Washington, DC, USA, 5–8 November 2013; pp. 160–164. [Google Scholar]
Yun, I.; Kim, J. Vision-based 1D barcode localization method for scale and rotation invariant. In Proceedings of the TENCON-IEEE 2017 Conference, Penang, Malaysia, 5–8 November 2017; pp. 2204–2208. [Google Scholar]
Wachenfeld, S.; Terlunen, S.; Jiang, X. Robust recognition of 1-D barcodes using camera phones. In Proceedings of the 19th International Conference on Pattern Recognition (ICPR), Tampa, FL, USA, 8–11 December 2008; pp. 1–4. [Google Scholar]
Szentandrási, I.; Herout, A.; Dubská, M. Fast detection and recognition of QR codes in high-resolution images. In Proceedings of the 28th Spring Conference. Computer. Graphics (SCCG), New York, NY, USA, 2–4 May 2012; pp. 129–136. [Google Scholar]
Dubská, M.; Herout, A.; Havel, J. Real-time precise detection of regular grids and matrix codes. J. Real-Time Image Process. 2016, 11, 193–200. [Google Scholar] [CrossRef]
Bodnár, P.; Grósz, T.; Nyúl, L.G. Efficient visual code localization with neural networks. Pattern Anal. Appl. 2018, 21, 249–260. [Google Scholar] [CrossRef]
Generate a Large Labelled Dataset of Barcodes from Open Food Facts Data. 2018. Available online: https://github.com/openfoodfacts/openfoodfacts-ai/issues/15 (accessed on 11 October 2024).
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Proessing Systems 28, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q. DETRs beat YOLOs on real-time object detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024. [Google Scholar]

Figure 1. EAN-13 Barcode Structure.

Figure 2. QR Barcode Structure.

Figure 3. Illustration of a Defective Barcode.

Figure 4. YOLOv8n Model Architecture. Lightweight first-stage localization network for multi-type barcode detection. Features a CBS/C2F/Bottleneck backbone extracting multi-scale outputs (P8, P16, P32), an FPN+PAN neck with multi-dilation-rate SPPFK-5, and an anchor-free decoupled head. Outputs normalized coordinates for ROI cropping/resizing to 224 × 224 prior to defect analysis.

Figure 5. ResNet50 Model Architecture. Residual network extracting features from 224 × 224 barcode ROIs. Outputs 1024D vectors via GAP and linear projection for cross-modal fusion.

Figure 6. ViT-B/16 model Architecture.

Figure 7. Y8-LiBAR Net Architecture: A Two-Stage Deep Network for Multi-Type Barcode Defect Detection.

Figure 8. Schematic structure of the barcode dataset used in this study.

Figure 9. Examples of barcode images from the dataset, including various 1D and 2D formats with different types of defects.

Figure 10. Evaluation curves on the test set: PR, F1–Confidence, Recall–Confidence, and Precision–Confidence. (a) Precision–Recall curve; (b) F1–Confidence curve; (c) Recall–Confidence curve; (d) Precision–Confidence curve.

Figure 11. Evaluation curves on the test set: PR, F1–Confidence, Recall–Confidence, and Precision–Confidence. (a) Confusion Matrix; (b) Normalized Confusion Matrix.

Figure 12. Normalized confusion matrix for barcode defect classification using Y8-LiBAR Net.

Figure 13. Schematic diagram of failure cases.

Table 1. Comparative Results of Barcode Defect Detection. Bold formatting is applied to emphasize the superior performance of the dataset used in this study.

Dataset Name	Images	Resolution (Min → Max)	Total Annotations	Barcode Types Included
BarBeR	8748	200 × 141 → 5984 × 3376	9818	1D and 2D
DEAL KAIST Lab	3308	141 × 200 → 3480 × 4640	3454	1D and 2D
Dubská QR	810	402 × 604 → 2560 × 1440	806	1D
InventBar	527	480 × 640	563	1D and 2D
Arte-Lab Medium 1D	430	1152 × 864 → 2976 × 2232	437	1D and 2D
Bodnár-Huawei QR	98	1600 × 1200	98	2D
Barcode Detection Annotated Datasets	708	1280 × 720	–	1D
Muenster BarcodeDB	1055	640 × 480 → 2592 × 1944	–	1D
InventBar	527	4032 × 3024	527	1D
ParcelBar	844	1478 × 1108	844	1D
Dubská QR Datasets #1	410	1440 × 2560 → 2 560 × 1 440	–	2D
Dubská QR Datasets #2	400	402 ×604 → 604 × 402	–	2D
Barcode dataset	2741	416 × 416	2741	1D and 2D

Table 2. Comparative Results of 1D Barcode Object Detection (Each Image Contains ≥ 1 One-Dimensional Barcode, IoU = 0.5).

Detection Method	Precision	Recall	F1 Score
Zharkov et al. [21]	0.715	0.940	0.812
Faster R-CNN [35]	0.979	0.990	0.984
RetinaNet [36]	0.984	0.986	0.985
YOLO Nano [7]	0.985	0.988	0.986
YOLO Medium [7]	0.982	0.989	0.985
RT-DETR [37]	0.986	0.993	0.989
Y8-LiBAR-1D	0.978	0.994	0.986

Table 3. Comparative Results of 2D Barcode Object Detection (Each Image Contains ≥ 1 Two-Dimensional Barcode, IoU = 0.5).

Detection Method	Precision	Recall	F1 Score
Y8-LiBAR-2D	0.955	0.919	0.937
Faster R-CNN	0.952	0.921	0.936
YOLO Nano	0.952	0.917	0.934
RetinaNe	0.954	0.918	0.936
YOLO Medium	0.948	0.922	0.935
RT-DETR	0.946	0.916	0.931

Table 4. Ablation Experiment Results.

Model Variant	Accuracy	Precision	Recall	F1 Score	${R e c a l l}_{D}$
Y8-LiBAR Net	0.9250	0.9185	0.9250	0.9187	0.7667
w/o ViT branch	0.8821	0.8673	0.8795	0.8721	0.5912
w/o ResNet branch	0.8634	0.8419	0.8562	0.8473	0.5238
w/o Shared Encoder	0.9073	0.8962	0.9028	0.8987	0.6984
w/o Orthogonal Regularization	0.9167	0.9081	0.9142	0.9103	0.7246

Table 5. Comparative Results of Barcode Defect Detection.

Detection Method	Precision	Recall	F1 score
Y8-LiBAR Net	0.9185	0.925	0.9217
SVM-RBF	0.872	0.88	0.876
DenseNet-121	0.895	0.905	0.9
ViT-B/16	0.905	0.915	0.91

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Duan, G.; Zhang, S.; Shang, Y.; Shao, Y.; Han, Y. Research on a Multi-Type Barcode Defect Detection Model Based on Machine Vision. Appl. Sci. 2025, 15, 8176. https://doi.org/10.3390/app15158176

AMA Style

Duan G, Zhang S, Shang Y, Shao Y, Han Y. Research on a Multi-Type Barcode Defect Detection Model Based on Machine Vision. Applied Sciences. 2025; 15(15):8176. https://doi.org/10.3390/app15158176

Chicago/Turabian Style

Duan, Ganglong, Shaoyang Zhang, Yanying Shang, Yongcheng Shao, and Yuqi Han. 2025. "Research on a Multi-Type Barcode Defect Detection Model Based on Machine Vision" Applied Sciences 15, no. 15: 8176. https://doi.org/10.3390/app15158176

APA Style

Duan, G., Zhang, S., Shang, Y., Shao, Y., & Han, Y. (2025). Research on a Multi-Type Barcode Defect Detection Model Based on Machine Vision. Applied Sciences, 15(15), 8176. https://doi.org/10.3390/app15158176

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on a Multi-Type Barcode Defect Detection Model Based on Machine Vision

Abstract

1. Introduction

2. Background Literature

3. Model Construction

3.1. Barcode Localization Submodule-YOLOv8n

3.2. Feature Representation Submodule-ResNet50

3.3. Global Semantic Submodule-ViT-B/16

3.4. Lightweight Two-Stage Fusion Model-Y8-LiBAR Net

4. Experiment and Results Analysis

4.1. Dataset Introduction

4.2. Object Detection Experiment

4.3. Defect Detection Experiment

4.4. Failure Case Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI