YOLO-LA: Prototype-Based Vision–Language Alignment for Silicon Wafer Defect Pattern Detection

Ziyue Wang; Yichen Yang; Jianning Chu; Yikai Zang; Zhongdi She; Weikang Fang; Ruoxin Wang

doi:10.3390/mi17010067

,

and

¹

School of Intelligent Manufacturing, Jianghan University, Wuhan 430056, China

²

Marine Design and Research Institute of China, Shanghai 200011, China

³

Hubei Key Laboratory of Modern Manufacturing Quality Engineering, School of Mechanical Engineering, Hubei University of Technology, Wuhan 430068, China

⁴

State Key Laboratory of Intelligent Manufacturing Equipment and Technology, School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China

Micromachines2026, 17(1), 67;https://doi.org/10.3390/mi17010067

This article belongs to the Special Issue Future Trends in Ultra-Precision Machining

Version Notes

Order Reprints

Abstract

With the rapid development of semiconductor manufacturing technology, methods to effectively control the production process, reduce variation in the manufacturing process, and improve the yield rate represent important competitive factors for wafer factories. Wafer bin maps, a method for characterizing wafer defect patterns, provide valuable information for engineers to quickly identify potential root causes through accurate pattern recognition. Vision-based deep learning approaches rely on visual patterns to achieve robust performance. However, they rarely exploit the rich semantic information embedded in defect descriptions, limiting interpretability and generalization. To address this gap, we propose YOLO-LA, a lightweight prototype-based vision–language alignment framework that integrates a pretrained frozen YOLO backbone with a frozen text encoder to enhance wafer defect recognition. A learnable projection head is introduced to map visual features into a shared embedding space, enabling classification through cosine similarity Experimental results on the WM-811K dataset demonstrate that YOLO-LA consistently improves classification accuracy across different backbones while introducing minimal additional parameters. In particular, YOLOv12 achieves the fastest speed while maintaining competitive accuracy, whereas YOLOv10 benefits most from semantic prototype alignment. The proposed framework is lightweight and suitable for real-time industrial wafer inspection systems.

Keywords:

semiconductor; defect detection; wafer bin map; YOLO; vision language model

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.