BAG-CLIP: Bifurcated Attention Graph-Enhanced CLIP for Zero-Shot Industrial Anomaly Detection
Abstract
1. Introduction
- We introduce a Bifurcated Self-Attention (BSA) module integrated directly into the image encoder to explicitly decouple visual feature processing into two distinct pathways: a Global Semantic Branch and a Detail-Preserving Branch. The former is dedicated to extracting global contextual information, while the latter retains high-fidelity, fine-grained structural and textural information. This dual-path design effectively mitigates the intrinsic conflict between global semantic abstraction and spatial localization.
- We propose a Self-Attention Graph (SAG) module to model complex morphological anomalies, such as fine cracks and winding scratches. By treating the spatial features extracted from the Detail-Preserving Branch as graph nodes, this module dynamically infers their topological relationships and aggregates neighborhood information. This mechanism effectively suppresses background interference while significantly enhancing the model’s capacity to represent the topological structures of these complex anomalies.
- We conduct extensive experiments on five diverse industrial datasets, notably in-cluding challenging transmission line inspection datasets. We perform comprehensive comparisons with 11 state-of-the-art zero-shot and few-shot methods. Experimental results validate that the proposed method delivers superior zero-shot anomaly detection (ZSAD) performance in both image-level anomaly classification and pixel-level anomaly segmentation.
2. Related Work
2.1. Traditional Anomaly Detection
2.2. CLIP-Based Zero-Shot Anomaly Detection
3. Proposed Method
3.1. Overall Architecture
- Global Semantic Branch: This path is responsible for aggregating high-level contextual information to generate a compact feature vector for image-level classification.
- Detail-Preserving Branch: This path preserves and concatenates multi-scale, high-resolution feature maps to generate local detail features for pixel-level anomaly localization.
3.2. Bifurcated Self-Attention (BSA)
3.2.1. Detail-Preserving Branch
3.2.2. Global Semantic Branch
3.3. Self-Attention Graph (SAG)
- Symmetrization: To ensure information can flow bidirectionally between neighbors, the adjacency matrix is converted to a symmetric form.
- Introduction of Self-loops: To ensure each node retains its own original features during information aggregation, preventing its own information from being overly diluted during multi-layer propagation, we add the identity matrix :
- Symmetric Normalization: To prevent gradient vanishing or explosion during multi-layer GCN propagation, which affects model training stability, we employ symmetric normalization. First, we compute the degree matrix , whose diagonal elements are defined as the sum of the -th row of . Based on the degree matrix , we compute the final normalized adjacency matrix used for GCN propagation:
3.4. Graph-Based Information Propagation
3.5. Joint Feature Fusion (JFF)
3.6. Loss Function
4. Experiment and Analysis
4.1. Experimental Setup
4.1.1. Dataset Descriptions
4.1.2. Evaluation Metrics
4.1.3. Implementation Details
4.2. Comparative Experiments
4.3. Ablation Studies
4.3.1. Module Ablation Studies
4.3.2. Key Parameter Sensitivity Analysis
4.4. Model Complexity Analysis
4.5. Failure Cases
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| VLMs | vision-language models |
| ZSAD | Zero-shot Anomaly Detection |
| ViT | Vision Transformer |
| GCN | Graph Convolutional Network |
| MLP | Multi-Layer Perceptron |
| AUROC | Area Under the Receiver Operating Characteristic Curve |
| F1-Max | maximum F1 score |
| AP | Average Precision |
| AUPRO | Area Under Per-Region Overlap |
| BSA | Bifurcated Self-Attention |
| SAG | Self-Attention Graph |
| SAFR | Scale-Adaptive Feature Recalibration |
| JFF | Joint Feature Fusion |
References
- Zhao, Y.; Liu, Q.; Su, H.; Zhang, J.; Ma, H.; Zou, W. Attention-based multiscale feature fusion for efficient surface defect detection. IEEE Trans. Instrum. Meas. 2024, 73, 5013310. [Google Scholar] [CrossRef]
- Jha, S.B.; Babiceanu, R.F. Deep CNN-based visual defect detection: Survey of current literature. Comput. Ind. 2023, 148, 103911. [Google Scholar] [CrossRef]
- Zhang, Z.; Chen, S.; Huang, J.; Ma, J. Zero-shot defect detection with anomaly attribute awareness via textual domain bridge. IEEE Sens. J. 2025, 25, 11759–11771. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), Virtual Event, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Ma, W.; Zhang, X.; Yao, Q.; Tang, F.; Wu, C.; Li, Y.; Yan, R.; Jiang, Z.; Zhou, S.K. Aa-CLIP: Enhancing zero-shot anomaly detection via anomaly-aware CLIP. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 4744–4754. [Google Scholar] [CrossRef]
- Liu, Y.; Li, Q.; Wang, Z.; Kato, J.; Zhang, J.; Wang, W. LECLIP: Boosting zero-shot anomaly detection with local enhanced CLIP. IEEE Trans. Instrum. Meas. 2025, 74, 5034111. [Google Scholar] [CrossRef]
- Wu, H.; Jia, D.; Zhang, T.; Bai, X.; Sun, L.; Pu, M. Multimodal zero-shot anomaly detection using dual-experts for electrical power equipment inspection images. J. Image Graph. 2025, 30, 672–682. [Google Scholar] [CrossRef]
- Vieira e Silva, A.L.B.; de Castro Felix, H.; Simões, F.P.M.; Teichrieb, V.; dos Santos, M.; Santiago, H.; Sgotti, V.; Lott Neto, H. InsPLAD: A dataset and benchmark for power line asset inspection in UAV images. Int. J. Remote Sens. 2023, 44, 7294–7320. [Google Scholar] [CrossRef]
- Jezek, S.; Jonak, M.; Burget, R.; Dvorak, P.; Skotak, M. Deep learning-based defect detection of metal parts: Evaluating current methods in complex conditions. In Proceedings of the 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), Brno, Czech Republic, 25–27 October 2021; pp. 66–71. [Google Scholar] [CrossRef]
- Mishra, P.; Verk, R.; Fornasier, D.; Piciarelli, C.; Foresti, G.L. VT-ADL: A vision transformer network for image anomaly detection and localization. In Proceedings of the IEEE 30th International Symposium on Industrial Electronics (ISIE), Kyoto, Japan, 20–23 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Bao, T.; Chen, J.; Li, W.; Wang, X.; Fei, J.; Wu, L.; Zhao, R.; Zheng, Y. MIAD: A maintenance inspection dataset for unsuper-vised anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Paris, France, 2–6 October 2023; pp. 993–1002. Available online: https://ieeexplore.ieee.org/document/10350876 (accessed on 16 March 2026).
- Jeong, J.; Zou, Y.; Kim, T.; Zhang, D.; Ravichandran, A.; Dabeer, O. WinCLIP: Zero-/few-shot anomaly classification and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 19606–19616. [Google Scholar] [CrossRef]
- Zhou, Q.; Pang, G.; Tian, Y.; He, S.; Chen, J. AnomalyCLIP: Object-agnostic prompt learning for zero-shot anomaly detection. In Proceedings of the 12th International Conference on Learning Representations (ICLR), Vienna, Austria, 7–11 May 2024; pp. 49705–49737. [Google Scholar]
- Qu, Z.; Tao, X.; Gong, X.; Qu, S.; Chen, Q.; Zhang, Z. Bayesian prompt flow learning for zero-shot anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 10–17 June 2025; pp. 30398–30408. [Google Scholar] [CrossRef]
- Luo, W.; Cao, Y.; Yao, H.; Zhang, X.; Lou, J.; Cheng, Y. Exploring intrinsic normal prototypes within a single image for universal anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 10–17 June 2025; pp. 9974–9983. [Google Scholar] [CrossRef]
- Duan, M.; Mao, L.; Liu, R.; Liu, W.; Liu, Z. Unified model based on reinforced feature reconstruction for metro track anomaly detection. IEEE Sens. J. 2024, 24, 5025–5038. [Google Scholar] [CrossRef]
- Xiang, P.; Ali, S.; Jung, S.K.; Zhou, H. Hyperspectral anomaly detection with guided autoencoder. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5538818. [Google Scholar] [CrossRef]
- Xia, X.; Pan, X.; Li, N.; He, X.; Ma, L.; Zhang, X.; Ding, N. GAN-based anomaly detection: A review. Neurocomputing 2022, 493, 497–535. [Google Scholar] [CrossRef]
- Pang, G.; Shen, C.; Cao, L.; van den Hengel, A. Deep learning for anomaly detection: A review. ACM Comput. Surv. 2021, 54, 1–38. [Google Scholar] [CrossRef]
- Zhou, Y.; Liang, X.; Zhang, W.; Zhang, L.; Song, X. VAE-based deep SVDD for anomaly detection. Neurocomputing 2021, 453, 131–140. [Google Scholar] [CrossRef]
- Zhang, Z.; Deng, X. Anomaly detection using improved deep SVDD model with data structure preservation. Pattern Recognit. Lett. 2021, 148, 1–6. [Google Scholar] [CrossRef]
- Li, Z.; Yan, H.; Tsung, F.; Zhang, K. Profile decomposition based hybrid transfer learning for cold-start data anomaly detection. ACM Trans. Knowl. Discov. Data 2022, 16, 121. [Google Scholar] [CrossRef]
- Roth, K.; Pemula, L.; Zepeda, J.; Schölkopf, B.; Brox, T.; Gehler, P. Towards total recall in industrial anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 14318–14328. [Google Scholar] [CrossRef]
- Defard, T.; Setkov, A.; Loesch, A.; Audigier, R. PaDiM: A patch distribution modeling framework for anomaly detection and localization. In Proceedings of the Pattern Recognition. ICPR International Workshops and Challenges, Virtual Event, Milan, Italy, 10–15 January 2021; pp. 475–489. [Google Scholar] [CrossRef]
- Chen, X.; Han, Y.; Zhang, J. APRIL-GAN: A zero-/few-shot anomaly classification and segmentation method for CVPR 2023 VAND Workshop Challenge Tracks 1&2: 1st place on zero-shot AD and 4th place on few-shot AD. arXiv 2023, arXiv:2305.17382. [Google Scholar] [CrossRef]
- Cao, Y.; Zhang, J.; Frittoli, L.; Cheng, Y.; Shen, W.; Boracchi, G. AdaCLIP: Adapting CLIP with hybrid learnable prompts for zero-shot anomaly detection. In Proceedings of the 18th European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024; pp. 55–72. [Google Scholar] [CrossRef]
- Zhu, H.; Zhao, C.; Yuan, Y.; Liu, M. A zero-shot anomaly detection network with patch-augmented prompts and test-time adaptation. Eng. Appl. Artif. Intell. 2026, 165, 113525. [Google Scholar] [CrossRef]
- Chen, P.; Huang, F.; Huang, C. DyC-CLIP: Dynamic context-aware multi-modal prompt learning for zero-shot anomaly detection. Pattern Recognit. 2026, 176, 113215. [Google Scholar] [CrossRef]
- Kim, D.; Park, C.; Cho, S.; Lim, H.; Kang, M.; Lee, J.; Lee, S. Generalizing CLIP prompts for zero-shot anomaly detection. Pattern Recognit. 2026, 178, 113406. [Google Scholar] [CrossRef]
- Chen, X.; Zhang, J.; Tian, G.; He, H.; Zhang, W.; Wang, Y.; Wang, C.; Liu, Y. CLIP-AD: A language-guided staged dual-path model for zero-shot anomaly detection. In Proceedings of the Human Activity Recognition and Anomaly Detection (IJCAI 2024), Jeju, Republic of Korea, 3–9 August 2024; pp. 17–33. [Google Scholar] [CrossRef]
- Gao, B.B.; Zhou, Y.; Yan, J.; Cai, Y.; Zhang, W.; Wang, M.; Liu, J.; Liu, Y.; Wang, L.; Wang, C. AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Singapore, 2026; Available online: https://api.semanticscholar.org/CorpusID:278636461 (accessed on 16 March 2026).
- Fang, Q.; Lv, W.; Su, Q. AF-CLIP: Zero-Shot Anomaly Detection via Anomaly-Focused CLIP Adaptation. In Proceedings of the ACM International Conference on Multimedia (ACM MM), 2025; Available online: https://api.semanticscholar.org/CorpusID:280322959 (accessed on 16 March 2026).
- Salehi, M.R.; Sadjadi, N.; Baselizadeh, S.; Rabiee, H.R. TIPS Over Tricks: Simple Prompts for Effective Zero-Shot Anomaly Detection. arXiv 2026, arXiv:2602.03594. [Google Scholar] [CrossRef]
- Gao, B.-B.; Wang, C.J. One Language-Free Foundation Model Is Enough for Universal Vision Anomaly Detection. arXiv 2026, arXiv:2601.05552. Available online: https://api.semanticscholar.org/CorpusID:284597414 (accessed on 16 March 2026). [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. Available online: https://link.springer.com/chapter/10.1007/978-3-030-01234-2_1 (accessed on 16 March 2026).
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
- Liu, S.; Huang, D.; Wang, Y. Learning spatial fusion for single-shot object detection. arXiv 2019, arXiv:1911.09516. Available online: https://arxiv.org/abs/1911.09516 (accessed on 16 March 2026). [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 22–25 July 2017; pp. 4700–4708. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
- Milletari, F.; Navab, N.; Ahmadi, S.A. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar] [CrossRef]
- Bergmann, P.; Fauser, M.; Sattlegger, D.; Steger, C. MVTec AD—A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9584–9592. [Google Scholar] [CrossRef]
- Zou, Y.; Jeong, J.; Pemula, L.; Zhang, D.; Dabeer, O. SPot-the-Difference Self-supervised Pre-training for Anomaly Detection and Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 392–408. [Google Scholar] [CrossRef]







| Method | EPED | InsPLAD | MPDD | BTAD | MIAD |
|---|---|---|---|---|---|
| WinCLIP | (59.9, 66.5, 54.2) | (78.3, 68.5, 65.4) | (61.5, 77.5, 69.2) | (68.2, 67.8, 70.9) | (61.3, 67.1, 61.8) |
| AnomalyCLIP | (61.6, 68.2, 62.4) | (68.6, 58.5, 55.5) | (77.5, 80.4, 82.5) | (88.2, 83.8, 88.2) | (63.5, 71.1, 64.4) |
| AdaCLIP | (69.3, 69.4, 70.2) | (85.6, 75.6, 71.9) | (73.0, 80.6, 76.3) | (89.2, 84.6, 89.8) | (65.8, 70.1, 66.5) |
| APRIL-GAN | (64.8, 67.7, 66.4) | (85.7, 72.6, 72.8) | (76.8, 80.7, 82.5) | (73.5, 68.0, 69.7) | (62.0, 68.9, 62.5) |
| INP-Former | (54.4, 64.8, 54.6) | (61.3, 56.6, 44.9) | (54.9, 74.6, 65.3) | (83.7, 84.6, 87.3) | (55.4, 67.2, 54.5) |
| Bayes-PFL | (67.8, 71.9, 64.0) | (80.4, 70.3, 64.0) | (76.9, 81.3, 78.7) | (90.5, 85.4, 89.1) | (60.5, 69.5, 60.8) |
| Tipsomaly | (72.7, 74.2, 71.7) | (70.0, 59.6, 59.3) | (75.5, 79.7, 79.8) | (94.9, 91.4, 95.5) | (73.6, 71.2, 71.9) |
| AdaptCLIP | (59.8, 67.0, 60.3) | (67.3, 61.8, 49.0) | (76.8, 79.3, 76.9) | (91.4, 90.3, 92.2) | (62.1, 71.0, 62.7) |
| AF-CLIP | (67.4, 71.1, 68.4) | (80.9, 75.2, 75.8) | (75.8, 81.6, 81.6) | (94.3, 91, 95.2) | (63.6, 70.4, 64.4) |
| PaDiM 4+ | (48.7, 64.5, 49.6) | (56.6, 55.0, 39.0) | (50.0, 74.0, 60.5) | (91.5, 86.7, 90.1) | (46.1, 65.9, 44.3) |
| PatchCore 4+ | (43.7, 63.7, 45.7) | (61.0, 59.4, 45.7) | (62.2, 79.9, 65.1) | (90.9, 88.2, 92.6) | (45.8, 60.4, 41.2) |
| BAG-CLIP (Ours) | (76.4 ± 0.2, 77.6 ± 0.1, 72.8 ± 0.3) | (88.4 ± 0.3, 75.6 ± 0.1, 75.9 ± 0.4) | (80.3 ± 0.3, 85.7 ± 0.1, 83.6 ± 0.1) | (92.2 ± 0.1, 92.7 ± 0.2, 96.9 ± 0.3) | (75.8 ± 0.3, 74.2 ± 0.4, 73.5 ± 0.1) |
| Method | EPED | InsPLAD | MPDD | BTAD | MIAD |
|---|---|---|---|---|---|
| WinCLIP | (62.9, 3.1, 3.8, 26.0) | (75.9, 20.9, 15.0, 39.7) | (71.2, 15.4, 14.1, 40.5) | (72.7, 18.5, 12.9, 27.5) | (67.4, 4.6, 3.9, 34.8) |
| AnomalyCLIP | (91.3, 21.5, 12.9, 74.7) | (85.4, 26.8, 20.1, 57.4) | (87.7, 30.6, 25.0, 73.3) | (87.7, 41.7, 38.5, 62.5) | (88.3, 9.0, 4.4, 71.5) |
| AdaCLIP | (94.8, 28.4, 24.4, 68.8) | (83.2, 23.3, 17.8, 53.6) | (93.9, 28.9, 25.8, 62.8) | (90.2, 40.6, 34.8, 20.3) | (80.6, 5.0, 1.6, 69.6) |
| APRIL-GAN | (94.9, 24.5, 18.9, 82.7) | (76.8, 16.3, 10.4, 43.8) | (94.3, 31.3, 26.6, 83.8) | (89.3, 40.6, 36.5, 68.8) | (87.7, 9.2, 4.2, 73.9) |
| INP-Former | (76.9, 2.2, 1.0, 40.5) | (82.4, 16.4, 10.2, 46.9) | (90.6, 14.4, 8.2, 71.1) | (88.6, 32.6, 26.7, 65.0) | (92.0, 8.5, 5.3, 42.4) |
| Bayes-PFL | (97.3, 22.6, 17.6, 67.7) | (91.4, 27.5, 21.1, 51.8) | (97.1, 32.9, 30.0, 84.6) | (92.3, 43.9, 40.2, 67.0) | (91.6, 6.6, 3.2, 65.2) |
| Tipsomaly | (93.5, 25.6, /, 81.2) | (86.9, 22.8, /, 55.1) | (95.7, 33.1, /, 86.5) | (96.7, 56.2, /, 84.7) | (91.3, 9.1, /, 70.5) |
| AdaptCLIP | (90.8, 26.4, 21.1, 50.5) | (83.7, 22.1, 15.7, 68.3) | (96.0, 30.4, 29.2, 92.9) | (94.8, 47.7, 44.8, 81.4) | (90.3, 7.6, 3.4, 69.7) |
| AF-CLIP | (96.1, 23.5, 19.7, 84.5) | (88.7, 27.2, 22.5, 61.8) | (96.6, 29.5, 27.9, 90.2) | (94.4, 47.5, 41.9, 78.4) | (91.6, 5.8, 2.7, 68.5) |
| PaDiM 4+ | (82.4, 5.0, 2.1, 50.0) | (78.7, 14.4, 8.6, 45.1) | (90.1, 15.9, 9.7, 65.7) | (96.3, 44.2, 38.9, 86.6) | (79.4, 3.7, 1.5, 58.2) |
| PatchCore 4+ | (75.3, 5.3, 2.0, 64.8) | (77.4, 16.0, 11.5, 54.2) | (93.8, 22.4, 19.7, 77.3) | (95.6, 45.1, 40.1, 80.5) | (72.9, 4.3, 2.1, 62.7) |
| BAG-CLIP (Ours) | (97.9 ± 0.2, 28.9 ± 0.2, 23.2 ± 0.3, 85.9 ± 0.4) | (92.5 ± 0.4, 29.3 ± 0.2, 22.7 ± 0.2, 62.6 ± 0.3) | (97.3 ± 0.1, 33.4 ± 0.2, 31.8 ± 0.1, 91.4 ± 0.3) | (96.5 ± 0.2, 49.6 ± 0.1, 45.7 ± 0.1, 80.7 ± 0.2) | (92.7 ± 0.4, 9.7 ± 0.2, 6.6 ± 0.2, 71.3 ± 0.3) |
| SAG | SAFR | Dataset | Image-Level | Pixel-Level | ||||
|---|---|---|---|---|---|---|---|---|
| AUROC | F1-Max | AP | AUROC | F1-Max | AP | |||
| ✓ | ✗ | EPED | 72.1 | 71.6 | 71.2 | 90.7 | 18.6 | 14.3 |
| InsPLAD | 80.0 | 66.9 | 66.7 | 81.1 | 19.3 | 13.8 | ||
| MPDD | 75.7 | 82.6 | 78.5 | 93.4 | 27.7 | 24.9 | ||
| BTAD | 79.7 | 85.3 | 89.8 | 90.9 | 43.3 | 38.5 | ||
| Average | 76.9 | 76.6 | 76.6 | 89.0 | 27.2 | 22.9 | ||
| ✗ | ✓ | EPED | 70.6 | 72.3 | 69.8 | 92.2 | 20.7 | 14.8 |
| InsPLAD | 80.9 | 69.7 | 65.0 | 79.8 | 16.2 | 12.0 | ||
| MPDD | 78.0 | 81.9 | 79.5 | 93.1 | 23.9 | 20.9 | ||
| BTAD | 90.6 | 86.3 | 92.3 | 91.7 | 46.2 | 40.9 | ||
| Average | 80.0 | 77.6 | 76.7 | 89.2 | 26.8 | 22.2 | ||
| ✓ | ✓ | Average | 84.3 | 82.9 | 82.3 | 96.1 | 35.3 | 30.9 |
| Module Component | Parameters/M | Theoretical Computation/GFLOPs | Peak VRAM/MB | Single Inference Time/ms |
|---|---|---|---|---|
| Dynamic Graph Construction | None | 3.84 | 67.4 | 2 |
| GCN Feature Propagation | 1.08 | 7.18 | 52.4 | 9 |
| Total | 1.08 | 11.02 | 67.4 | 11 |
| Methods | Total Params/M | Inference Time/ms | FPS |
|---|---|---|---|
| AF-CLIP | 428.8 + 2.1 × 100 | 116.8 | 8.56 |
| AnomalyCLIP | 428.1 + 6.2 × 100 | 123.8 | 8.08 |
| AdaCLIP | 428.1 + 1.1 × 101 | 127.9 | 7.81 |
| Bayes-PFL | 429.4 + 2.7 × 101 | 236.6 | 4.23 |
| BAG-CLIP (Ours) | 429.0 + 1.3 × 101 | 121.6 | 8.22 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wu, H.; Zhang, T.; Li, S. BAG-CLIP: Bifurcated Attention Graph-Enhanced CLIP for Zero-Shot Industrial Anomaly Detection. Electronics 2026, 15, 1659. https://doi.org/10.3390/electronics15081659
Wu H, Zhang T, Li S. BAG-CLIP: Bifurcated Attention Graph-Enhanced CLIP for Zero-Shot Industrial Anomaly Detection. Electronics. 2026; 15(8):1659. https://doi.org/10.3390/electronics15081659
Chicago/Turabian StyleWu, Hua, Tingting Zhang, and Shubo Li. 2026. "BAG-CLIP: Bifurcated Attention Graph-Enhanced CLIP for Zero-Shot Industrial Anomaly Detection" Electronics 15, no. 8: 1659. https://doi.org/10.3390/electronics15081659
APA StyleWu, H., Zhang, T., & Li, S. (2026). BAG-CLIP: Bifurcated Attention Graph-Enhanced CLIP for Zero-Shot Industrial Anomaly Detection. Electronics, 15(8), 1659. https://doi.org/10.3390/electronics15081659

