CM-YOLO: A Multimodal PCB Defect Detection Method Based on Cross-Modal Feature Fusion
Abstract
1. Introduction
- (1)
- We propose a dual-stream feature extraction network for defect detection. Based on the latest detection model, YOLO11, we extend it to a dual-stream fusion network that integrates RGB and depth images, named Cross-Modal YOLO (CM-YOLO).
- (2)
- We designed the Cross-modal Attention for Space and Channel (CASC) module. By incorporating the CASC attention module into our dual-stream feature extraction network, we combine channel attention and spatial attention, thereby enhancing the feature extraction capability.
- (3)
- We have introduced a feature fusion module based on the concept of differential amplification into our dual-stream feature extraction network, which we name the Differential Amplification Weighted Fusion (DAWF) module. It divides feature information into shared and specific components while adaptively adjusting weights based on different defect categories using the SE attention mechanism. Finally, the features are fused, enabling the network to effectively utilize both 2D and 3D features.
2. Related Work
2.1. PCB Defect Detection
2.2. Multimodal Object Detection
- (1)
- CNN-based Fusion Methods: These methods mainly utilize the powerful feature extraction capabilities of Convolutional Neural Networks (CNNs) to fuse information from different modal images, generating more comprehensive and richer feature representations. For example, in [39], Zhang proposed the IFCNN method, which adopts a CNN structure aimed at providing a general framework for various image fusion tasks. The CNN structure is typically composed of multiple convolutional layers, enabling IFCNN to handle multi-scale information in images, thereby helping to preserve the details and overall structure of the image.
- (2)
- Encoder-Decoder-based Multimodal Image Fusion Methods: This approach uses a deep learning network (encoder) to extract and fuse features from different modal images and then reconstructs a high-quality fused image through a decoder. In [40], Li et al. proposed the DenseFuse method, which introduces the idea of dense connections, enabling the network to better capture the relationships between features when handling fusion tasks of infrared and visible light images.
- (3)
- Generative Adversarial Network (GAN)-Based Fusion Methods: These methods treat the fused image as a generator and design a discriminator to evaluate it, thereby improving the quality of the fused image. A pioneering work applying this technology in multimodal detection is [41], where the cm-SSFT algorithm was proposed. It combines shared modal information with modality-specific features and uses a Shared-Specific Transmission Network (SSTN) for information propagation and complementary learning, thus effectively enhancing the discriminative power and complementarity of the features, overcoming the limitations of traditional methods that only focus on shared features.
3. Methodology
3.1. CM-YOLO
3.2. Cross Attention of Space and Channel Module
3.3. Differential Amplification Weighted Fusion (DAWF)
3.4. Training Strategy and Loss Function
3.4.1. The Box Loss
3.4.2. The Classification Loss
3.4.3. The DFL Loss
4. Experiments
4.1. Experimental Setup
4.2. Experimental Metrics
4.3. Ablation Study
4.4. Comparison with Other Methods
4.5. Comparison on FLIR Dataset
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tang, J.; Wang, Z.; Zhang, H.; Li, H.; Wu, P.; Zeng, N. A lightweight surface defect detection framework combined with dual domain attention mechanism. Expert Syst. Appl. 2024, 238, 121726. [Google Scholar] [CrossRef]
- Natarajan, S.; Sathaye, A.; Oak, C.; Chaplot, N.; Banerjee, S. DEFCON: Defect Acceleration through Content Optimization. In Proceedings of the 2022 IEEE International Test Conference (ITC), Anaheim, CA, USA, 23–30 September 2022; pp. 298–304. [Google Scholar] [CrossRef]
- Wang, Z.; Yuan, H.; Lv, J.; Liu, C.; Xu, H.; Li, J. Anomaly Detection and Fault Classification of Printed Circuit Boards Based on Multimodal Features of the Infrared Thermal Imaging. IEEE Trans. Instrum. Meas. 2024, 73, 3518513. [Google Scholar] [CrossRef]
- Yu, X.; Lyu, W.; Zhou, D.; Wang, C.; Xu, W. ES-Net: Efficient Scale-Aware Network for Tiny Defect Detection. IEEE Trans. Instrum. Meas. 2022, 71, 3511314. [Google Scholar] [CrossRef]
- Liu, X. An Adaptive Defect-Aware Attention Network for Accurate PCB-Defect Detection. IEEE Trans. Instrum. Meas. 2024, 73, 5040811. [Google Scholar] [CrossRef]
- Bai, X.; Wang, X.; Liu, X.; Liu, Q.; Song, J.; Sebe, N.; Kim, B. Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments. Pattern Recognit. 2021, 120, 108102. [Google Scholar] [CrossRef]
- Quan, Y.; Chen, Y.; Shao, Y.; Teng, H.; Xu, Y.; Ji, H. Image denoising using complex-valued deep CNN. Pattern Recognit. 2021, 111, 107639. [Google Scholar] [CrossRef]
- Bai, T.; Luo, J.; Zhou, S.; Lu, Y.; Wang, Y. Vehicle-Type Recognition Method for Images Based on Improved Faster R-CNN Model. Sensors 2024, 24, 2650. [Google Scholar] [CrossRef]
- Pan, K.; Hu, H.; Gu, P. WD-YOLO: A More Accurate YOLO for Defect Detection in Weld X-ray Images. Sensors 2023, 23, 8677. [Google Scholar] [CrossRef]
- Wu, J.; Zhou, W.; Qiu, W.; Yu, L. Depth Repeated-Enhancement RGB Network for Rail Surface Defect Inspection. IEEE Signal Process. Lett. 2022, 29, 2053–2057. [Google Scholar] [CrossRef]
- Zhou, W.; Hong, J. FHENet: Lightweight Feature Hierarchical Exploration Network for Real-Time Rail Surface Defect Inspection in RGB-D Images. IEEE Trans. Instrum. Meas. 2023, 72, 5005008. [Google Scholar] [CrossRef]
- Gao, C.; Chen, X.; Zhou, J.; Wang, J.; Shen, L. Open-Set Fabric Defect Detection With Defect Generation and Transfer. IEEE Trans. Instrum. Meas. 2025, 74, 3517213. [Google Scholar] [CrossRef]
- Liu, Y.; Gao, C.; Song, B.; Liang, S. A Surface Defect Detection Algorithm for PCB Based on MobileViT-YOLO. In Proceedings of the 2023 China Automation Congress (CAC), Chongqing, China, 17–19 November 2023; pp. 6318–6323. [Google Scholar] [CrossRef]
- Feng, B.; Cai, J. PCB Defect Detection via Local Detail and Global Dependency Information. Sensors 2023, 23, 7755. [Google Scholar] [CrossRef] [PubMed]
- Zeng, N.; Wu, P.; Wang, Z.; Li, H.; Liu, W.; Liu, X. A Small-Sized Object Detection Oriented Multi-Scale Feature Fusion Approach With Application to Defect Detection. IEEE Trans. Instrum. Meas. 2022, 71, 3507014. [Google Scholar] [CrossRef]
- Luo, J.; Yang, Z.; Li, S.; Wu, Y. FPCB Surface Defect Detection: A Decoupled Two-Stage Object Detection Framework. IEEE Trans. Instrum. Meas. 2021, 70, 5012311. [Google Scholar] [CrossRef]
- Luo, W.; Luo, J.; Yang, Z. FPC surface defect detection based on improved Faster R-CNN with decoupled RPN. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; pp. 7035–7039. [Google Scholar] [CrossRef]
- Ling, Q.; Isa, N.A.M.; Asaari, M.S.M. SDD-Net: Soldering defect detection network for printed circuit boards. Neurocomputing 2024, 610, 128575. [Google Scholar] [CrossRef]
- Tang, J.; Liu, S.; Zhao, D.; Tang, L.; Zou, W.; Zheng, B. PCB-YOLO: An improved detection algorithm of PCB surface defects based on YOLOv5. Sustainability 2023, 15, 5963. [Google Scholar] [CrossRef]
- Du, B.; Wan, F.; Lei, G.; Xu, L.; Xu, C.; Xiong, Y. YOLO-MBBi: PCB surface defect detection method based on enhanced YOLOv5. Electronics 2023, 12, 2821. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
- Li, P.; Xu, F.; Wang, J.; Guo, H.; Liu, M.; Du, Z. DGConv: A Novel Convolutional Neural Network Approach for Weld Seam Depth Image Detection. Comput. Mater. Contin. 2024, 78, 1755–1771. [Google Scholar] [CrossRef]
- Hwang, S.; Park, J.; Kim, N.; Choi, Y.; Kweon, I.S. Multispectral pedestrian detection: Benchmark dataset and baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1037–1045. [Google Scholar]
- Liu, J.; Zhang, S.; Wang, S.; Metaxas, D.N. Multispectral deep neural networks for pedestrian detection. arXiv 2016, arXiv:1611.02644. [Google Scholar]
- Park, K.; Kim, S.; Sohn, K. Unified multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recognit. 2018, 80, 143–155. [Google Scholar] [CrossRef]
- Li, C.; Song, D.; Tong, R.; Tang, M. Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recognit. 2019, 85, 161–171. [Google Scholar] [CrossRef]
- Zhang, H.; Fromont, E.; Lefevre, S.; Avignon, B. Guided attentive feature fusion for multispectral pedestrian detection. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 72–80. [Google Scholar]
- Dai, W.; Mujeeb, A.; Erdt, M.; Sourin, A. Soldering defect detection in automatic optical inspection. Adv. Eng. Inform. 2020, 43, 101004. [Google Scholar] [CrossRef]
- Sezer, A.D.; Altan, A. Detection of solder paste defects with an optimization-based deep learning model using image processing techniques. Solder. Surf. Mt. Technol. 2021, 33, 291–298. [Google Scholar] [CrossRef]
- Xiao, Y.; Tian, Z.; Yu, J.; Zhang, Y.; Liu, S.; Du, S.; Lan, X. A review of object detection based on deep learning. Multimedia Tools Appl. 2020, 79, 23729–23791. [Google Scholar] [CrossRef]
- Tang, S.; He, F.; Huang, X.; Yang, J. Online PCB Defect Detector on A New PCB Defect Dataset. arXiv 2019, arXiv:1902.06197. [Google Scholar] [CrossRef]
- Wang, Z.; Chen, W.; Li, T.; Zhang, S.; Xiong, R. Improved YOLOv3 detection method for PCB plug-in solder joint defects based on ordered probability density weighting and attention mechanism. AI Commun. 2022, 35, 171–186. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Yuan, M.; Zhou, Y.; Ren, X.; Zhi, H.; Zhang, J.; Chen, H. YOLO-HMC: An Improved Method for PCB Surface Defect Detection. IEEE Trans. Instrum. Meas. 2024, 73, 2001611. [Google Scholar] [CrossRef]
- Zheng, Y.; Izzat, I.H.; Ziaee, S. GFD-SSD: Gated fusion double SSD for multispectral pedestrian detection. arXiv 2019, arXiv:1903.06999. [Google Scholar]
- Zhang, L.; Liu, Z.; Zhang, S.; Yang, X.; Qiao, H.; Huang, K.; Hussain, A. Cross-modality interactive attention network for multispectral pedestrian detection. Inf. Fusion 2019, 50, 20–29. [Google Scholar] [CrossRef]
- Li, C.; Song, D.; Tong, R.; Tang, M. Multispectral pedestrian detection via simultaneous detection and segmentation. In Proceedings of the British Machine Vision Conference, BMVC, Newcastle, UK, 3–6 September 2018. [Google Scholar]
- Li, H.; Wu, X.-J. DenseFuse: A Fusion Approach to Infrared and Visible Images. IEEE Trans. Image Process. 2019, 28, 2614–2623. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Liu, Y.; Sun, P.; Yan, H.; Zhao, X.; Zhang, L. IFCNN: A general image fusion framework based on convolutional neural network. Inf. Fusion 2020, 54, 99–118. [Google Scholar] [CrossRef]
- Lu, Y.; Wu, Y.; Liu, B.; Zhang, T.; Li, B.; Chu, Q.; Yu, N. Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 13376–13386. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.-Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Jocher, G.; Stoken, A.; Borovec, J.; Changyu, L.; Hogan, A.; Diaconu, L.; Ingham, F.; Poznanski, J.; Fang, J.; Yu, L.; et al. ultralytics/yolov5: V3.1-Bug fixes and performance improvements. Zenodo 2020. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO (Version 8.0.0) [Computer software]. Available online: https://github.com/ultralytics/ultralytics (accessed on 30 May 2025).
- Peng, Y.; Li, H.; Wu, P.; Zhang, Y.; Sun, X.; Wu, F. D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement. arXiv 2024, arXiv:2410.13842. [Google Scholar] [CrossRef]
- Devaguptapu, C.; Akolekar, N.; Sharma, M.M.; Balasubramanian, V.N. Borrow from anywhere: Pseudo multi-modal object detection in thermal imagery. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp. 1029–1038. [Google Scholar]
Category | Train Samples | Test Samples |
---|---|---|
component_missing | 805 | 177 |
component_shift | 151 | 41 |
dirt | 117 | 36 |
lifted_pin | 183 | 42 |
solder_bridging | 314 | 82 |
Model | Data | Precision | Recall | mAP |
---|---|---|---|---|
YOLO11 | Depth | 0.832 | 0.934 | 0.900 |
YOLO11 | RGB | 0.851 | 0.955 | 0.907 |
YOLO11-Multi | Multi | 0.803 | 0.949 | 0.946 |
YOLO11-CASC | Multi | 0.861 | 0.981 | 0.956 |
YOLO11-DAWF | Multi | 0.855 | 0.984 | 0.958 |
CM-YOLO | Multi | 0.912 | 0.984 | 0.969 |
Model | Data | Component Missing | Solder Bridging | Lifted Pin | Component Shift | Dirt | mAP |
---|---|---|---|---|---|---|---|
Faster R-CNN [42] | RGB | 0.948 | 0.966 | 0.223 | 0.966 | 0.989 | 0.818 |
Faster R-CNN [42] | Depth | 0.887 | 0.974 | 0.503 | 0.989 | 0.885 | 0.847 |
SSD [43] | RGB | 0.906 | 0.905 | 0.230 | 0.909 | 0.995 | 0.789 |
SSD [43] | Depth | 0.897 | 0.903 | 0.619 | 0.998 | 0.998 | 0.883 |
YOLOv5 [44] | Depth | 0.915 | 0.983 | 0.808 | 0.989 | 0.963 | 0.932 |
YOLOv5 [44] | RGB | 0.977 | 0.995 | 0.754 | 0.976 | 0.991 | 0.939 |
YOLOv8 [45] | Depth | 0.904 | 0.986 | 0.760 | 0.979 | 0.952 | 0.916 |
YOLOv8 [45] | RGB | 0.978 | 0.993 | 0.610 | 0.993 | 0.978 | 0.911 |
YOLO11 | Depth | 0.893 | 0.987 | 0.703 | 0.978 | 0.940 | 0.900 |
YOLO11 | RGB | 0.974 | 0.984 | 0.599 | 0.993 | 0.984 | 0.907 |
Ablation Study on Multimodal Fusion | |||||||
YOLO11-Multi | Multi | 0.972 | 0.995 | 0.810 | 0.970 | 0.983 | 0.946 |
YOLO11-CASC | Multi | 0.954 | 0.995 | 0.867 | 0.994 | 0.973 | 0.956 |
YOLO11-DAWF | Multi | 0.950 | 0.991 | 0.886 | 0.992 | 0.973 | 0.958 |
CM-YOLO (ours) | Multi | 0.974 | 0.993 | 0.913 | 0.995 | 0.968 | 0.969 |
Compared with SOTA | |||||||
D-Fine [46] | RGB | 0.970 | 0.974 | 0.882 | 0.998 | 0.982 | 0.961 |
D-Fine [46] | Depth | 0.922 | 0.968 | 0.891 | 0.959 | 0.981 | 0.944 |
CM-YOLO (ours) | Multi | 0.974 | 0.993 | 0.913 | 0.995 | 0.968 | 0.969 |
Model | Time (s) | Avg Time (ms/image) | FPS | Model Size (MB) | Parameters |
---|---|---|---|---|---|
Faster R-CNN [42] | 29.00 | 146.46 | 6.83 | 108.33 | 28,369,481 |
YOLOv8 [45] | 3.10 | 15.66 | 63.84 | 5.36 | 2,702,926 |
YOLO11 | 3.20 | 16.16 | 61.93 | 5.22 | 2,606,272 |
D-Fine [46] | 16.15 | 81.57 | 12.26 | 476.55 | 31,252,766 |
CM-YOLO (ours) | 8.70 | 22.02 | 45.44 | 5.20 | 2,511,527 |
Experiment Number | Precision | Recall | mAP |
---|---|---|---|
1 | 0.9443 | 0.9660 | 0.9716 |
2 | 0.9174 | 0.9751 | 0.9675 |
3 | 0.9309 | 0.9703 | 0.9713 |
4 | 0.9490 | 0.9662 | 0.9721 |
5 | 0.9424 | 0.9670 | 0.9716 |
Mean | 0.9368 | 0.9685 | 0.9708 |
Std | 0.0114 | 0.0034 | 0.0019 |
95% CI | [0.9219, 0.9517] | [0.9649, 0.9722] | [0.9689, 0.9728] |
Model | Dataset | Precision | Recall | mAP |
---|---|---|---|---|
CM-YOLO (ours) | Multi (darkened) | 0.933 | 0.952 | 0.959 |
CM-YOLO (ours) | Multi (with noise) | 0.880 | 0.934 | 0.946 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lan, H.; Luo, J.; Zhang, H.; Yan, X. CM-YOLO: A Multimodal PCB Defect Detection Method Based on Cross-Modal Feature Fusion. Sensors 2025, 25, 4108. https://doi.org/10.3390/s25134108
Lan H, Luo J, Zhang H, Yan X. CM-YOLO: A Multimodal PCB Defect Detection Method Based on Cross-Modal Feature Fusion. Sensors. 2025; 25(13):4108. https://doi.org/10.3390/s25134108
Chicago/Turabian StyleLan, Haowen, Jiaxiang Luo, Hualiang Zhang, and Xu Yan. 2025. "CM-YOLO: A Multimodal PCB Defect Detection Method Based on Cross-Modal Feature Fusion" Sensors 25, no. 13: 4108. https://doi.org/10.3390/s25134108
APA StyleLan, H., Luo, J., Zhang, H., & Yan, X. (2025). CM-YOLO: A Multimodal PCB Defect Detection Method Based on Cross-Modal Feature Fusion. Sensors, 25(13), 4108. https://doi.org/10.3390/s25134108