Calibrated Feature Fusion: Enhancing Few-Shot Industrial Anomaly Detection via Cross-Stage Representation Alignment
Abstract
1. Introduction
- 1.
- We analyze cross-stage representation inconsistency as a limiting factor in few-shot industrial anomaly detection, particularly under scarce supervision.
- 2.
- We propose CFF, a simple yet effective module that aligns multi-stage feature distributions via a symmetric similarity loss, stabilizing fusion without modifying the base architecture.
- 3.
- Extensive experiments on MVTec AD and VisA show that CFF consistently improves upon the strong April-GAN baseline, achieving gains of up to +1.6% AUROC and +4.1% AP. Ablation studies confirm that representation alignment is key to its performance.
2. Related Work
2.1. Vision–Language Models for Anomaly Detection
2.2. Multi-Stage Feature Fusion
2.3. Feature Calibration and Internal Consistency
3. Method
3.1. Preliminaries: April-GAN
3.2. Calibrated Feature Fusion (CFF)
3.2.1. Alignment Loss
3.2.2. Two-Stage Training Strategy
- Stage 1 (Projection Learning): Freeze the ViT backbone and train the initial projectors using standard segmentation losses (Focal and Dice).
- Stage 2 (Calibration Learning): Freeze , initialize , and train them with a combined loss:where and operate on the final fused anomaly map , and enforces cross-stage consistency.
3.3. Inference Protocol
4. Experiments
4.1. Experimental Setup
4.2. Main Results
Discussion on Metric Trade-Offs
4.3. Ablation Studies
4.4. Sensitivity to Shot Number and Calibration Design
4.5. Effect of Calibration Block Design
4.6. Visualization
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhao, D.; Zhou, L.; Li, Y.; He, W.; Arun, P.V.; Zhu, X.; Hu, J. Visibility Estimation via Near-infrared Bispectral Real-time Imaging in Bad Weather. Infrared Phys. Technol. 2024, 136, 105008. [Google Scholar] [CrossRef]
- Bergmann, P.; Löwe, M.; Fauser, M.; Kraft, D.; Odobez, J.M. MVTec AD—A comprehensive real-world dataset for unsupervised anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9584–9592. [Google Scholar] [CrossRef]
- Wang, Y.; Liu, W.; Li, F.; Leng, H.; Zha, W.; He, J.; Ma, G.; Duan, Y. A fast template matching method based on improved ring projection transformation and local dynamic time warping. Optik 2020, 216, 164954. [Google Scholar] [CrossRef]
- Harris, C.G.; Stephens, M.J. A combined corner and edge detector. In Alvey Vision Conference; Plessey Company Inc.: Fairport, NY, USA, 1988; pp. 147–151. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Zavrtanik, V.; Kristan, M.; Skocaj, D. Reconstruction by inpainting for visual anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 15774–15783. [Google Scholar] [CrossRef]
- Defard, T.; Setkov, A.; Loesch, A.; Audigier, R. PaDiM: A Patch Distribution Modeling Framework for Anomaly Detection and Localization. In Proceedings of the 1st International Workshop on Industrial Machine Learning (IIML), ICPR, Milan, Italy, 10 October 2020; pp. 1–8. [Google Scholar]
- Gu, Z.; Zhu, B.; Zhu, G.; Chen, Y.; Tang, M.; Wang, J. UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection. arXiv 2025, arXiv:2412.03342. [Google Scholar]
- Liu, J.; Xie, G.; Wang, J.; Li, S.; Wang, C.; Zheng, F.; Jin, Y. Deep Industrial Image Anomaly Detection: A Survey. arXiv 2023, arXiv:2301.11514. [Google Scholar] [CrossRef]
- Heckler-Kram, L.; Neudeck, J.H.; Bergmann, S.; Fauser, M.; Sattlegger, D.; Steger, C. The MVTec AD 2 Dataset: Advanced Scenarios for Unsupervised Anomaly Detection. arXiv 2025, arXiv:2503.21622. [Google Scholar] [CrossRef]
- Roth, K.; Pemula, L.; Zepeda, J.; Schölkopf, B.; Brox, T.; Gehler, P. Towards Total Recall in Industrial Anomaly Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11324–11334. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. arXiv 2021, arXiv:2103.00020. [Google Scholar] [CrossRef]
- Jeong, J.; Zou, Y.; Kim, T.; Zhang, D.; Ravichandran, A.; Dabeer, O. WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation. arXiv 2023, arXiv:2303.14814. [Google Scholar]
- Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; et al. DINOv2: Learning Robust Visual Features Without Supervision. arXiv 2023, arXiv:2304.07193. [Google Scholar]
- Yuan, J.; Ye, J.; Chen, W.; Gao, C. AD-DINOv3: Enhancing DINOv3 for Zero-Shot Anomaly Detection with Anomaly-Aware Calibration. arXiv 2025, arXiv:2509.14084. [Google Scholar]
- Chen, X.; Han, Y.; Zhang, J. APRIL-GAN: A Zero-/Few-Shot Anomaly Classification and Segmentation Method for CVPR 2023 VAND Workshop Challenge Tracks 1&2: 1st Place on Zero-shot AD and 4th Place on Few-shot AD. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar] [CrossRef]
- Ye, Z.; Lyu, W.; Guo, Q.; Deng, Z.; Xu, W. Improving knowledge distillation via multi-level normalization and multi-level decoupling. Knowl.-Based Syst. 2025, 325, 113958. [Google Scholar] [CrossRef]
- Chen, C.F.R.; Fan, Q.; Panda, R. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Iqbal, N.; Martinei, N. Pyramid-based Mamba Multi-class Unsupervised Anomaly Detection. arXiv 2025, arXiv:2504.03442. [Google Scholar]
- Gao, B.B.; Zhou, Y.; Yan, J.; Cai, Y.; Zhang, W.; Wang, M.; Liu, J.; Liu, Y.; Wang, L.; Wang, C. AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection. arXiv 2025, arXiv:2505.09926. [Google Scholar] [CrossRef]
- Li, X.; Zhang, Z.; Tan, X.; Chen, C.; Qu, Y.; Xie, Y.; Ma, L. PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
- You, K.; Shi, Y.; Yan, Y.; Zhao, J.; Zhang, Z.; Chen, Y.; Wang, Y. AnomalyCLIP: Object-agnostic Prompt Learning for Zero-Shot Anomaly Detection. In Proceedings of the ACM International Conference on Multimedia (ACM MM), Ottawa, ON, Canada, 29 October–3 November 2023; pp. 5280–5288. [Google Scholar] [CrossRef]
- Cohen, N.; Hoshen, Y. Sub-image anomaly detection with deep pyramid correspondences. arXiv 2020, arXiv:2005.02357. [Google Scholar]
- You, M.; Yao, Y.; Zhao, D.; Zhao, Z.; Arun, P.V.; Wang, Y.; Zhou, H.; Chi, R. S3CRAD: Superpixel-guided Background Inpainting and Spatial-spectral Constrained Representation for Hyperspectral Anomaly Detection. Opt. Lasers Eng. 2026, 201, 109657. [Google Scholar] [CrossRef]
- Zhang, J.; Xiang, P.; Shi, J.; Teng, X.; Zhao, D.; Zhou, H.; Li, H.; Song, J. A Light CNN based on Residual Learning and Background Estimation for Hyperspectral Anomaly Detection. Int. J. Appl. Earth Obs. Geoinf. 2024, 132, 104069. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI); Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2117–2125. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Zhao, D.; Xu, X.; You, M.; Arun, P.V.; Zhao, Z.; Ren, J.; Wu, L.; Zhou, H. Local Sub-block Contrast and Spatial-spectral Gradient Features Fusion for Hyperspectral Anomaly Detection. Remote Sens. 2025, 17, 695. [Google Scholar] [CrossRef]
- Zhao, D.; Yan, W.; You, M.; Zhang, J.; Arun, P.V.; Jiao, C.; Wang, Q.; Zhou, H. Hyperspectral Anomaly Detection based on Empirical Mode Decomposition and Local Weighted Contrast. IEEE Sens. J. 2024, 24, 33847–33861. [Google Scholar] [CrossRef]
- Liu, X.; Liu, J.; Tang, J.; Wu, G. CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Zhang, L.; Bao, C.; Ma, K. Self-Distillation: Towards Efficient and Compact Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 4388–4403. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (ICML); Proceedings of Machine Learning Research; PMLR: Vienna, Austria, 2020; Volume 119, pp. 1597–1607. [Google Scholar]
- Gidaris, S.; Bursuc, A.; Komodakis, N.; Pérez, P.; Cord, M. Boosting Few-Shot Visual Learning with Self-Supervision. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8059–8067. [Google Scholar] [CrossRef]
- Li, X.; Huang, Z.; Xue, F.; Zhou, Y. MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images. arXiv 2024, arXiv:2401.16753. [Google Scholar]
- Wang, Y.; Ma, X.; Chen, Z.; Luo, Y.; Yi, J.; Baek, J. Symmetric Cross-Entropy for Robust Learning with Noisy Labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 322–330. [Google Scholar] [CrossRef]
- Zhang, W.; Luo, C. MCLaST: Multi-hierarchy contrastive learning graph anomaly detection with structure-awareness. Neurocomputing 2026, 669, 132480. [Google Scholar] [CrossRef]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. Adv. Neural Inf. Process. Syst. 2021, 34, 3. [Google Scholar]
- Gao, P.; Geng, S.; Zhang, R.; Ma, T.; Fang, R.; Zhang, Y.; Li, H.; Qiao, Y. CLIP-Adapter: Better Vision-Language Models with Feature Adapters. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 469–486. [Google Scholar]
- Liu, S.; Zhang, Y.; Wang, X.; Chen, H. HiFA: Hierarchical Feature Alignment for Unsupervised Anomaly Detection. In European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2024. [Google Scholar]
- Zhao, T.; Yang, L.; Patel, R.; Kim, M. Calibrated Fusion of Vision-Language and Self-Supervised Features for Industrial Anomaly Detection. arXiv 2025, arXiv:2503.08765. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Milletari, F.; Navab, N.; Ahmadi, S.A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar] [CrossRef]
- Ouyang, Y.; Li, Y.; Wang, W.; Chen, X.; Qian, C. SPot-the-Difference: Self-Supervised Pre-training for Anomaly Detection and Segmentation. arXiv 2022, arXiv:2207.14315. [Google Scholar]


| Setting | Method | AUROC-Segm | F1-Max-Segm | AP-Segm | PRO-Segm | AUROC-Cls | F1-Max-Cls | AP-Cls |
|---|---|---|---|---|---|---|---|---|
| 1-shot | SPADE [24] | 92.0 ± 0.3 | 44.7 ± 1.0 | – | 85.7 ± 1.7 | 82.9 ± 2.6 | 89.1 ± 1.0 | 89.7 ± 1.7 |
| 1-shot | PaDiM [7] | 91.3 ± 0.7 | 43.5 ± 1.5 | – | 78.2 ± 0.8 | 78.9 ± 3.1 | 91.2 ± 1.1 | 91.3 ± 1.2 |
| 1-shot | PatchCore [11] | 93.2 ± 0.6 | 53.0 ± 1.7 | – | 82.3 ± 1.3 | 86.3 ± 3.0 | 93.0 ± 1.5 | 93.8 ± 1.7 |
| 1-shot | WinCLIP [13] | 95.3 ± 0.5 | 55.9 ± 2.7 | – | 87.1 ± 1.2 | 93.1 ± 2.3 | 92.7 ± 1.1 | 96.5 ± 0.9 |
| 1-shot | April-GAN [16] | 95.1 ± 0.1 | 54.2 ± 0.0 | 51.8 ± 0.1 | 90.6 ± 0.2 | 92.0 ± 0.3 | 92.4 ± 0.2 | 95.8 ± 0.2 |
| 1-shot | Ours (+CFF) | 95.3 ± 0.0 | 55.8 ± 0.0 | 54.2 ± 0.0 | 91.4 ± 0.0 | 95.2 ± 0.0 | 95.2 ± 0.0 | 97.3 ± 0.0 |
| 2-shot | SPADE [24] | 89.2 ± 0.4 | 42.4 ± 1.0 | – | 83.9 ± 0.7 | 81.0 ± 2.0 | 90.3 ± 0.8 | 90.6 ± 0.8 |
| 2-shot | PaDiM [7] | 91.3 ± 0.9 | 40.2 ± 2.1 | – | 77.3 ± 2.0 | 76.6 ± 2.1 | 88.2 ± 1.1 | 88.1 ± 1.7 |
| 2-shot | PatchCore [11] | 92.0 ± 1.0 | 50.4 ± 1.7 | – | 78.7 ± 2.0 | 83.4 ± 3.0 | 90.5 ± 1.5 | 92.2 ± 1.5 |
| 2-shot | WinCLIP [13] | 96.0 ± 0.3 | 58.8 ± 2.1 | – | 99.4 ± 0.9 | 94.3 ± 1.3 | 94.5 ± 0.8 | 97.0 ± 0.7 |
| 2-shot | April-GAN [16] | 95.5 ± 0.0 | 55.9 ± 0.5 | 53.4 ± 0.4 | 91.3 ± 0.1 | 92.4 ± 0.3 | 92.6 ± 0.1 | 96.0 ± 0.2 |
| 2-shot | Ours (+CFF) | 96.0 ± 0.0 | 57.8 ± 0.0 | 55.5 ± 0.0 | 92.3 ± 0.0 | 94.8 ± 0.0 | 95.3 ± 0.0 | 97.6 ± 0.0 |
| 4-shot | SPADE [24] | 92.7 ± 0.3 | 46.2 ± 1.3 | – | 87.0 ± 0.5 | 84.8 ± 2.5 | 91.5 ± 0.9 | 90.5 ± 1.2 |
| 4-shot | PaDiM [7] | 92.6 ± 0.7 | 46.1 ± 1.8 | – | 81.3 ± 1.9 | 80.4 ± 2.5 | 90.2 ± 1.2 | 92.5 ± 1.6 |
| 4-shot | PatchCore [11] | 94.3 ± 0.5 | 55.9 ± 1.9 | – | 84.3 ± 1.8 | 88.5 ± 2.3 | 92.6 ± 1.6 | 94.3 ± 1.5 |
| 4-shot | WinCLIP [13] | 96.2 ± 0.3 | 59.0 ± 1.8 | – | 89.0 ± 0.6 | 95.2 ± 1.6 | 94.7 ± 0.8 | 97.5 ± 0.6 |
| 4-shot | April-GAN [16] | 95.9 ± 0.0 | 56.9 ± 0.1 | 54.5 ± 0.2 | 91.8 ± 0.1 | 92.8 ± 0.2 | 92.8 ± 0.1 | 96.3 ± 0.1 |
| 4-shot | Ours (+CFF) | 96.2 ± 0.0 | 59.4 ± 0.0 | 57.9 ± 0.0 | 92.8 ± 0.0 | 96.1 ± 0.0 | 96.1 ± 0.0 | 98.2 ± 0.0 |
| Setting | Method | AUROC-Segm | F1-Max-Segm | AP-Segm | PRO-Segm | AUROC-Cls | F1-Max-Cls | AP-Cls |
|---|---|---|---|---|---|---|---|---|
| 1-shot | SPADE [24] | 95.6 ± 0.4 | 35.5 ± 2.2 | – | 84.1 ± 1.6 | 79.5 ± 4.0 | 78.7 ± 1.9 | 82.0 ± 3.3 |
| 1-shot | PaDiM [7] | 89.9 ± 0.8 | 17.4 ± 1.7 | – | 64.3 ± 2.4 | 62.8 ± 5.4 | 75.3 ± 1.2 | 68.3 ± 4.0 |
| 1-shot | PatchCore [11] | 95.4 ± 0.6 | 38.0 ± 2.9 | – | 85.1 ± 2.5 | 79.9 ± 2.0 | 81.7 ± 1.6 | 82.8 ± 2.5 |
| 1-shot | WinCLIP [13] | 96.1 ± 0.4 | 41.9 ± 1.3 | – | 80.5 ± 2.3 | 83.5 ± 4.9 | 87.1 ± 1.7 | 85.1 ± 4.0 |
| 1-shot | April-GAN [16] | 96.0 ± 0.0 | 38.5 ± 3.7 | 30.9 ± 0.3 | 90.0 ± 1.1 | 91.7 ± 0.5 | 86.9 ± 0.6 | 93.3 ± 3.3 |
| 1-shot | Ours (+CFF) | 97.3 ± 0.0 | 38.9 ± 0.0 | 32.1 ± 0.0 | 90.4 ± 0.0 | 89.2 ± 0.0 | 85.2 ± 0.0 | 90.4 ± 0.0 |
| 2-shot | SPADE [24] | 96.2 ± 0.4 | 35.4 ± 0.3 | – | 85.7 ± 0.1 | 80.2 ± 8.0 | 81.7 ± 2.5 | 82.3 ± 4.8 |
| 2-shot | PaDiM [7] | 92.0 ± 0.7 | 21.1 ± 2.9 | – | 70.1 ± 2.6 | 67.4 ± 5.1 | 75.7 ± 1.8 | 71.6 ± 3.8 |
| 2-shot | PatchCore [11] | 96.1 ± 0.5 | 41.0 ± 3.3 | – | 82.6 ± 2.3 | 84.6 ± 4.0 | 82.5 ± 1.8 | 84.8 ± 3.2 |
| 2-shot | WinCLIP [13] | 96.8 ± 0.3 | 43.5 ± 3.9 | – | 86.2 ± 1.4 | 81.0 ± 2.4 | 83.0 ± 1.4 | 85.8 ± 2.7 |
| 2-shot | April-GAN [16] | 96.2 ± 0.0 | 39.3 ± 3.2 | 31.6 ± 0.3 | 90.1 ± 0.8 | 92.7 ± 3.4 | 87.1 ± 2.3 | 94.2 ± 2.7 |
| 2-shot | Ours (+CFF) | 97.5 ± 0.0 | 42.0 ± 0.0 | 35.2 ± 0.0 | 92.1 ± 0.0 | 90.4 ± 0.0 | 86.6 ± 0.0 | 91.1 ± 0.0 |
| 4-shot | SPADE [24] | 96.6 ± 0.3 | 43.6 ± 0.6 | – | 87.3 ± 1.1 | 81.1 ± 0.1 | 82.7 ± 0.1 | 83.4 ± 0.3 |
| 4-shot | PaDiM [7] | 93.2 ± 0.5 | 24.6 ± 1.8 | – | 72.6 ± 1.9 | 72.2 ± 2.9 | 78.0 ± 1.2 | 75.6 ± 2.2 |
| 4-shot | PatchCore [11] | 96.8 ± 0.2 | 43.9 ± 3.0 | – | 84.9 ± 1.4 | 85.3 ± 1.1 | 84.2 ± 1.3 | 87.8 ± 2.1 |
| 4-shot | WinCLIP [13] | 97.2 ± 0.3 | 47.0 ± 1.1 | – | 87.6 ± 0.9 | 87.5 ± 2.1 | 88.3 ± 1.6 | 88.5 ± 1.8 |
| 4-shot | April-GAN [16] | 96.2 ± 0.0 | 40.0 ± 0.1 | 32.2 ± 0.1 | 90.2 ± 0.1 | 92.6 ± 0.4 | 88.4 ± 0.5 | 94.5 ± 0.3 |
| 4-shot | Ours (+CFF) | 97.8 ± 0.0 | 44.2 ± 0.0 | 36.3 ± 0.0 | 92.5 ± 0.0 | 92.1 ± 0.0 | 88.1 ± 0.0 | 92.8 ± 0.0 |
| Method | AUROC-Segm | AP-Segm | AUROC-Cls | AP-Cls |
|---|---|---|---|---|
| w/o CFF | 95.9 | 54.9 | 92.5 | 96.1 |
| w/ CFF | 96.2 | 57.9 | 96.1 | 98.2 |
| CFF with varying : | ||||
| 96.1 | 57.6 | 96.0 | 98.1 | |
| 96.2 | 57.9 | 96.1 | 98.2 | |
| 96.1 | 57.7 | 96.0 | 98.1 | |
| 96.0 | 57.4 | 95.9 | 98.0 | |
| Setting | Method | AUROC- Segm ↑ | AP- Segm ↑ | AUROC- Cls ↑ | AP- Cls ↑ |
|---|---|---|---|---|---|
| 4-shot | w/o CFF | 95.9 | 54.9 | 92.5 | 96.1 |
| CFF (Affine) | 96.2 | 57.9 | 96.1 | 98.2 | |
| CFF (MLP) | 96.2 | 57.4 | 96.8 | 98.6 |
| Shot | AUROC-Segm↑ | AP-Segm↑ | AUROC-Cls↑ | AP-Cls↑ |
|---|---|---|---|---|
| 1-shot | 95.3 | 52.5 | 92.8 | 96.6 |
| 2-shot | 95.8 | 56.9 | 95.7 | 97.6 |
| 4-shot | 96.2 | 57.4 | 96.8 | 98.6 |
| Metric | No Calibration | Linear Calibration | MLP Calibration |
|---|---|---|---|
| Inference Time (ms) | 41.6 | 69.6 | 133.8 |
| Throughput (samples/s) | 24.05 | 14.36 | 7.48 |
| Memory (GB) | 0.0187 | 0.0187 | 0.0187 |
| Parameters (M) | 2.25 | 4.51 | 11.27 |
| FLOPs (G) | 3.230 | 6.460 | 16.183 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zheng, S.; Zhang, S.; Huang, Z.; Sun, K.; Gong, Y.; Wen, J.; Liu, E. Calibrated Feature Fusion: Enhancing Few-Shot Industrial Anomaly Detection via Cross-Stage Representation Alignment. Sensors 2026, 26, 2164. https://doi.org/10.3390/s26072164
Zheng S, Zhang S, Huang Z, Sun K, Gong Y, Wen J, Liu E. Calibrated Feature Fusion: Enhancing Few-Shot Industrial Anomaly Detection via Cross-Stage Representation Alignment. Sensors. 2026; 26(7):2164. https://doi.org/10.3390/s26072164
Chicago/Turabian StyleZheng, Shuangjun, Songtao Zhang, Zhihuan Huang, Kuoteng Sun, Yuzhong Gong, Jiayan Wen, and Eryun Liu. 2026. "Calibrated Feature Fusion: Enhancing Few-Shot Industrial Anomaly Detection via Cross-Stage Representation Alignment" Sensors 26, no. 7: 2164. https://doi.org/10.3390/s26072164
APA StyleZheng, S., Zhang, S., Huang, Z., Sun, K., Gong, Y., Wen, J., & Liu, E. (2026). Calibrated Feature Fusion: Enhancing Few-Shot Industrial Anomaly Detection via Cross-Stage Representation Alignment. Sensors, 26(7), 2164. https://doi.org/10.3390/s26072164

