Cross-Modal Disagreement-Guided Reliability-Aware Scoring for RGB-3D Industrial Anomaly Detection
Abstract
1. Introduction
- A cross-modal disagreement modeling strategy is introduced to explicitly characterize the discrepancy between RGB and point-cloud anomaly responses, providing an additional anomaly cue beyond direct fusion.
- A dual-branch image-level score calibration strategy is proposed to adaptively combine a base fusion branch and a robust statistical branch, improving image-level performance under the released-code MVTec 3D-AD setting while revealing dataset sensitivity on additional benchmarks.
- A practical enhancement pipeline is constructed to improve local anomaly emphasis and global anomaly discrimination without replacing the original Hybrid Fusion backbone.
- Experimental results on MVTec 3D-AD demonstrate that the proposed enhancement improves the released-code Hybrid Fusion/M3DM baseline under the same official full-setting command in image-level ROCAUC, pixel-level ROCAUC, and AU-PRO, while additional Eyecandies and Real-IAD D3 subset evaluations examine cross-dataset transfer behavior.
2. Related Work
2.1. Memory-Based Industrial Anomaly Detection
2.2. Teacher–Student, Reconstruction, and Self-Supervised Strategies
2.3. Multimodal RGB–3D Anomaly Detection
3. Materials and Methods
3.1. Overall Framework
3.2. RGB and Point-Cloud Feature Extraction
3.3. Fusion Branch and Normal Memory Construction
3.4. Cross-Modal Disagreement Modeling
3.5. Dual-Branch Image-Level Score Calibration
3.6. Stability Cues for Adaptive Blending
3.7. Training and Inference Procedure
3.8. Implementation Details
4. Results
4.1. Experimental Setup
4.2. Official Released-Code Setting and Parameter Sensitivity
4.3. Comparison with the Baseline
4.4. Additional Evaluation on Eyecandies
4.5. Supplementary Evaluation on a Real-IAD D3 Subset
4.6. Category-Wise Analysis
4.7. Pixel-Level Localization Performance
4.8. Module Progression and Ablation Analysis
5. Discussion
5.1. Practical Implications for Industrial Inspection
5.2. Limitations and Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| RGB | Red, green, and blue image modality |
| 3D | Three-dimensional |
| ROCAUC | Area under the receiver operating characteristic curve |
| AU-PRO | Area under the per-region overlap curve |
| DRAEM | Discriminatively trained reconstruction embedding for anomaly detection |
| AEKD | Auto-encoder knowledge distillation |
| PIRN | Prototypical-based intra-modal reconstruction with normality communication |
| MAD | Multimodal anomaly detection |
References
- Bergmann, P.; Batzner, K.; Fauser, M.; Sattlegger, D.; Steger, C. The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. Int. J. Comput. Vis. 2021, 129, 1038–1059. [Google Scholar] [CrossRef]
- Lin, Y.; Chang, Y.; Tong, X.; Yu, J.; Liotta, A.; Huang, G.; Song, W.; Zeng, D.; Wu, Z.; Wang, Y.; et al. A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Image Anomaly Detection. Inf. Fusion 2025, 121, 103139. [Google Scholar] [CrossRef]
- Li, G.; Jiang, C.; Li, M.; Li, J.; Han, D.; Zhou, M. Industrial-Application-Oriented 2D Image and 3D Object Anomaly Detection Technology: A Comprehensive Review. Appl. Intell. 2025, 55, 938. [Google Scholar] [CrossRef]
- Roth, K.; Pemula, L.; Zepeda, J.; Schölkopf, B.; Brox, T.; Gehler, P. Towards Total Recall in Industrial Anomaly Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 14318–14328. [Google Scholar] [CrossRef]
- Defard, T.; Setkov, A.; Loesch, A.; Audigier, R. PaDiM: A Patch Distribution Modeling Framework for Anomaly Detection and Localization. In Pattern Recognition. ICPR International Workshops and Challenges; Springer: Cham, Switzerland, 2021; pp. 475–489. [Google Scholar] [CrossRef]
- Bergmann, P.; Jin, X.; Sattlegger, D.; Steger, C. The MVTec 3D-AD Dataset for Unsupervised 3D Anomaly Detection and Localization. In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP), Virtual, 5–7 February 2022; pp. 202–213. [Google Scholar] [CrossRef]
- Wang, Y.; Peng, J.; Zhang, J.; Yi, R.; Wang, Y.; Wang, C. Multimodal Industrial Anomaly Detection via Hybrid Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 8032–8041. [Google Scholar] [CrossRef]
- Liu, J.; Xie, G.; Chen, R.; Li, X.; Wang, J.; Liu, Y.; Wang, C.; Zheng, F. Real3D-AD: A Dataset of Point Cloud Anomaly Detection. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023), Datasets and Benchmarks Track; Neural Information Processing Systems Foundation, Inc.: La Jolla, CA, USA, 2023; pp. 30402–30415. Available online: https://proceedings.neurips.cc/paper_files/paper/2023/hash/611b896d447df43c898062358df4c114-Abstract-Datasets_and_Benchmarks.html (accessed on 26 May 2026).
- Zhao, B.; Xiong, Q.; Zhang, X.; Guo, J.; Liu, Q.; Xing, X.; Xu, X. PointCore: An Efficient Framework for Unsupervised Point Cloud Anomaly Detection Using Joint Local–Global Features. Neural Netw. 2026, 197, 108446. [Google Scholar] [CrossRef] [PubMed]
- Bergmann, P.; Fauser, M.; Sattlegger, D.; Steger, C. Uninformed Students: Student–Teacher Anomaly Detection with Discriminative Latent Embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 4183–4192. [Google Scholar] [CrossRef]
- Zavrtanik, V.; Kristan, M.; Skočaj, D. DRAEM: A Discriminatively Trained Reconstruction Embedding for Surface Anomaly Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 8330–8339. [Google Scholar] [CrossRef]
- Deng, H.; Li, X. Anomaly Detection via Reverse Distillation from One-Class Embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 9737–9746. [Google Scholar] [CrossRef]
- Li, C.-L.; Sohn, K.; Yoon, J.; Pfister, T. CutPaste: Self-Supervised Learning for Anomaly Detection and Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021; pp. 9664–9674. [Google Scholar] [CrossRef]
- Liu, Z.; Zhou, Y.; Xu, Y.; Wang, Z. SimpleNet: A Simple Network for Image Anomaly Detection and Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 20402–20411. [Google Scholar] [CrossRef]
- Wu, Q.; Li, H.; Tian, C.; Wen, L.; Li, X. AEKD: Unsupervised Auto-Encoder Knowledge Distillation for Industrial Anomaly Detection. J. Manuf. Syst. 2024, 73, 182–194. [Google Scholar] [CrossRef]
- Horwitz, E.; Hoshen, Y. Back to the Feature: Classical 3D Features Are (Almost) All You Need for 3D Anomaly Detection. arXiv 2022, arXiv:2203.05550. [Google Scholar] [CrossRef]
- Wang, C.; Zhu, W.; Gao, B.-B.; Gan, Z.; Zhang, J.; Gu, Z.; Qian, S.; Chen, M.; Ma, L. Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 22883–22892. Available online: https://openaccess.thecvf.com/content/CVPR2024/html/Wang_Real-IAD_A_Real-World_Multi-View_Dataset_for_Benchmarking_Versatile_Industrial_Anomaly_CVPR_2024_paper.html (accessed on 26 May 2026).
- Zhu, W.; Wang, L.; Zhou, Z.; Wang, C.; Pan, Y.; Zhang, R.; Chen, Z.; Cheng, L.; Gao, B.-B.; Zhang, J.; et al. Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection. arXiv 2025, arXiv:2504.14221. [Google Scholar] [CrossRef]
- Li, H.; Niu, Y.; Yin, H.; Mo, Y.; Liu, Y.; Huang, B.; Wu, R.; Liu, J. DAUP: Enhancing Point Cloud Homogeneity for 3D Industrial Anomaly Detection via Density-Aware Point Cloud Upsampling. Adv. Eng. Inform. 2024, 62, 102823. [Google Scholar] [CrossRef]
- Wang, J.; Niu, Y.; Huang, B. Fusion-Restoration Model for Industrial Multimodal Anomaly Detection. Neurocomputing 2025, 637, 130073. [Google Scholar] [CrossRef]
- Shangguan, W.; Wu, H.; Niu, Y.; Yin, H.; Yu, J.; Chen, B.; Huang, B. CPIR: Multimodal Industrial Anomaly Detection via Latent Bridged Cross-Modal Prediction and Intra-Modal Reconstruction. Adv. Eng. Inform. 2025, 65, 103240. [Google Scholar] [CrossRef]
- Wang, C.; Zhu, H.; Peng, J.; Wang, Y.; Yi, R.; Wu, Y.; Ma, L.; Zhang, J. M3DM-NR: RGB–3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 9981–9993. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Yang, X.; Zhang, J.; Tian, S.; Liao, J.; Liu, F. PIRN: Prototypical-Based Intra-Modal Reconstruction with Normality Communication for Multimodal Anomaly Detection. In Proceedings of the International Conference on Learning Representations (ICLR), Rio de Janeiro, Brazil, 24–28 April 2026; Available online: https://openreview.net/forum?id=7L7kmHHfgf (accessed on 26 May 2026).
- Costanzino, A.; Zama Ramirez, P.; Lisanti, G.; Di Stefano, L. Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 17234–17243. [Google Scholar] [CrossRef]
- Tao, C.; Cao, X.; Du, J. G2SF-MIAD: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection. arXiv 2025, arXiv:2503.10091. [Google Scholar] [CrossRef]
- Du, J.; Tao, C.; Cao, X.; Tsung, F. 3D Vision-Based Anomaly Detection in Manufacturing: A Survey. Front. Eng. Manag. 2025, 12, 343–360. [Google Scholar] [CrossRef]
- Caron, M.; Touvron, H.; Misra, I.; Jégou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging Properties in Self-Supervised Vision Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 9650–9660. [Google Scholar] [CrossRef]
- Pang, Y.; Wang, W.; Tay, F.E.H.; Liu, W.; Tian, Y.; Yuan, L. Masked Autoencoders for Point Cloud Self-Supervised Learning. In Computer Vision—ECCV 2022; Springer: Cham, Switzerland, 2022; pp. 604–621. [Google Scholar] [CrossRef]
- Bonfiglioli, L.; Toschi, M.; Silvestri, D.; Fioraio, N.; De Gregorio, D. The Eyecandies Dataset for Unsupervised Multimodal Anomaly Detection and Localization. In Computer Vision—ACCV 2022; Springer: Cham, Switzerland, 2023; pp. 459–475. [Google Scholar] [CrossRef]
- Li, Z.; Kang, X.; West, J.; Khoshelham, K. Out-of-Distribution Detection in 3D Applications: A Review. arXiv 2025, arXiv:2507.00570. [Google Scholar] [CrossRef]





| Parameter | Role | Representative Candidate Range | Final Value |
|---|---|---|---|
| Strength of disagreement-guided local modulation | 0.50 | ||
| K | Number of top responses used by the base image-score branch | 64 | |
| Weights of percentile, map-mean, and disagreement-mean statistics in the robust branch | Convex combinations emphasizing while retaining mean and disagreement terms | ||
| Scaling of conflict and reliability cues in adaptive blending | Unit-scale and nearby balanced settings | ||
| Recovery strength for underestimated high-percentile responses | 0.20 |
| Item | Released-Code Hybrid Fusion/M3DM Baseline | Proposed Add-On Setting |
|---|---|---|
| Code source | Public M3DM repository full-setting command | Same released code path with the proposed scoring module added |
| Backbones | DINO ViT-B/8 and Point-MAE | Same |
| Fusion module | Released UFF checkpoint, –use_uff | Same |
| Memory mode | –memory_bank multiple | Same |
| RGB input | RGB input | Same |
| Point grouping | 1024 groups with group size 128 | Same |
| Memory compression | Sparse-random-projection coreset with and | Same |
| Image score | Released-code fusion decision score | Disagreement-guided reliability-aware calibrated score |
| Interpretation | Local execution of the released full-setting pipeline | Add-on comparison under the same released-code setting |
| Setting | Image ROCAUC | Pixel ROCAUC | AU-PRO |
|---|---|---|---|
| Released-code baseline, fusion score, group 128, num 1024 | 0.779 | 0.975 | 0.915 |
| Released-code baseline, pixel-max score, group 128, num 1024 | 0.754 | 0.975 | 0.915 |
| Released-code baseline, pixel-TopK score, group 128, num 1024 | 0.731 | 0.975 | 0.915 |
| Released-code baseline, fusion score, group 64, num 784, multiscale | 0.781 | 0.976 | 0.917 |
| Proposed add-on, same default released-code setting | 0.800 | 0.980 | 0.926 |
| Method | Image ROCAUC | Pixel ROCAUC | AU-PRO |
|---|---|---|---|
| Hybrid Fusion/M3DM (reported in original paper) | 0.945 | 0.992 | 0.964 |
| Hybrid Fusion/M3DM (released-code baseline, local run) | 0.779 | 0.975 | 0.915 |
| Proposed Add-On (same released-code setting) | 0.800 | 0.980 | 0.926 |
| Method | Image ROCAUC | Pixel ROCAUC | AU-PRO |
|---|---|---|---|
| Hybrid Fusion/M3DM (released-code baseline) | 0.810 | 0.963 | 0.849 |
| Proposed Add-On (same released-code setting) | 0.794 | 0.971 | 0.870 |
| Method | Image ROCAUC | Pixel ROCAUC | AU-PRO |
|---|---|---|---|
| Hybrid Fusion/M3DM (released-code baseline) | 0.963 | 0.979 | 0.921 |
| Proposed Add-On (same released-code setting) | 0.980 | 0.988 | 0.941 |
| Method | Image ROCAUC | Pixel ROCAUC | AU-PRO |
|---|---|---|---|
| Released-Code Hybrid Fusion/M3DM Baseline | 0.779 | 0.975 | 0.915 |
| + Disagreement Modulation | 0.774 | 0.981 | 0.930 |
| + Reliability-Aware Score Calibration | 0.799 | 0.980 | 0.926 |
| Final Proposed Setting | 0.800 | 0.980 | 0.926 |
| Variant | Disagr. | Score Calib. | Stat. Calib. | Img AUC | Pix AUC | AU-PRO |
|---|---|---|---|---|---|---|
| Released-Code Baseline | No | No | No | 0.779 | 0.975 | 0.915 |
| Disagreement Only | Yes | No | No | 0.774 | 0.981 | 0.930 |
| Disagreement + Score Calibration | Yes | Yes | No | 0.765 | 0.980 | 0.926 |
| Partial Statistical Calibration | Yes | Yes | Partial | 0.799 | 0.980 | 0.926 |
| Full Statistical Calibration | Yes | Yes | Full | 0.793 | 0.980 | 0.926 |
| Claim | Supporting Evidence | Boundary |
|---|---|---|
| The proposed add-on improves the released-code Hybrid Fusion/M3DM baseline on MVTec 3D-AD | Table 4, Table 7 and Table 8 show improvements over the local released-code baseline | This is not a claim of outperforming the original CVPR-reported Hybrid Fusion/M3DM result |
| Cross-modal disagreement is useful for local anomaly enhancement | Pixel-level ROCAUC and AU-PRO improve on MVTec 3D-AD, Eyecandies, and the Real-IAD D3 subset | Localization gains are more consistent than image-level gains |
| Reliability-aware score calibration can improve image-level aggregation | MVTec 3D-AD image-level ROCAUC improves from 0.779 to 0.800 under the released-code setting | Eyecandies shows that image-level calibration is dataset-sensitive |
| The method is a lightweight enhancement of an existing pipeline | The same backbones, UFF fusion module, and memory-bank structure are retained | The method is not presented as a new theoretical anomaly detection paradigm |
| The Real-IAD D3 result provides additional real-world evidence | The three-category subset improves from 0.963/0.979/0.921 to 0.980/0.988/0.941 | The subset result does not replace a full Real-IAD D3 benchmark |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Xu, J.; Xiu, P.; Shi, K.; Xu, L.; Wang, H. Cross-Modal Disagreement-Guided Reliability-Aware Scoring for RGB-3D Industrial Anomaly Detection. Appl. Sci. 2026, 16, 5483. https://doi.org/10.3390/app16115483
Xu J, Xiu P, Shi K, Xu L, Wang H. Cross-Modal Disagreement-Guided Reliability-Aware Scoring for RGB-3D Industrial Anomaly Detection. Applied Sciences. 2026; 16(11):5483. https://doi.org/10.3390/app16115483
Chicago/Turabian StyleXu, Jing, Pengfei Xiu, Kun Shi, Lei Xu, and Hongliang Wang. 2026. "Cross-Modal Disagreement-Guided Reliability-Aware Scoring for RGB-3D Industrial Anomaly Detection" Applied Sciences 16, no. 11: 5483. https://doi.org/10.3390/app16115483
APA StyleXu, J., Xiu, P., Shi, K., Xu, L., & Wang, H. (2026). Cross-Modal Disagreement-Guided Reliability-Aware Scoring for RGB-3D Industrial Anomaly Detection. Applied Sciences, 16(11), 5483. https://doi.org/10.3390/app16115483

