SO-YOLO11-CDP: An Instance Segmentation-Based Approach for Cross-Depth-of-Field Positioning Micro Image Sensor Modules in Precision Assembly
Abstract
1. Introduction
- Enhance instance segmentation performance: SO-YOLO11 integrates three modules suitable for small-object segmentation. Coordinate attention mechanism was introduced into the segmentation head network to embed precise location information of targets; non-stride convolution is employed in the backbone network to enhance the fine-grained feature extraction capability for small targets; and EIOU loss function combined with normalized Wasserstein distance is introduced to improve target regression performance. SO-YOLO11 achieves high robustness and accuracy in instance segmentation performance.
- Improve spatial position detection accuracy: Establish a cross-depth-of-field image Jacobian matrix calibration error compensation model based on pinhole imaging principles. Design a cross-depth-of-field eye-to-hand calibration workflow to improve spatial position detection accuracy of object feature points. Error compensation method enables higher precision in cross-scale inspection of micro-components under microscopic vision.
2. Assembly Task Description and Precise Positioning Method
2.1. Assembly Process for Micro Image Sensor Modules
- Cross-depth-of-field position detection. With target sizes in the hundred-micrometer range, their representation in the image occupies a small pixel area. High-magnification microscopic vision suffers from a narrow field of view and shallow depth of field. This leads to partial defocusing of component features within the image space, introducing errors in 3D position detection.
- Stochastic initial conditions. During assembly, data cables exhibit flexibility. The welding fixture may deform cable cores during fixation, causing random variations in initial conditions. Overlapping target contour features lead to misdetection or missed detection in image segmentation.
2.2. Precision Position Detection Framework
3. Visual Recognition and Image Position Detection
3.1. SO-YOLO11 Segmentation Model
3.2. Sub-Pixel Image Position Detection Method Based on Image Gradients
4. Spatial Position Detection for Micro-Devices
4.1. Calibration Workflow
4.2. Image Jacobian Matrix Model
4.3. Calibration Error Compensation
5. Experimental Results and Analysis
5.1. Multi-Microparts Segmentation and Contour Precise Localization Experiments
5.1.1. Experimental Parameters and Datasets
5.1.2. Image Segmentation and Precise Extraction of Contour Feature Points
5.2. Verification Experiment for 3D Spatial Position Detection Accuracy
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Tian, W.; Ding, Y.; Du, X.; Li, K.; Wang, Z.; Wang, C.; Deng, C.; Liao, W. A Review of Intelligent Assembly Technology of Small Electronic Equipment. Micromachines 2023, 14, 1126. [Google Scholar] [CrossRef] [PubMed]
- Yan, S.; Xu, D.; Tao, X. Hierarchical Policy Learning with Demonstration Learning for Robotic Multiple Peg-in-Hole Assembly Tasks. IEEE Trans. Ind. Inform. 2023, 19, 10254–10264. [Google Scholar] [CrossRef]
- Yao, S.; Li, H.; Pang, S.; Zhu, B.; Zhang, X.; Fatikow, S. A Review of Computer Microvision-Based Precision Motion Measurement: Principles, Characteristics, and Applications. IEEE Trans. Instrum. Meas. 2021, 70, 5007928. [Google Scholar] [CrossRef]
- Hussain, M. YOLOv1 to v8: Unveiling Each Variant–A Comprehensive Review of YOLO. IEEE Access 2024, 12, 42816–42833. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2016; Volume 9905. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 378–385. [Google Scholar]
- Gong, X.; Su, H.; Xu, D.; Zhang, J.; Zhang, L.; Zhang, Z. Visual Defect Inspection for Deep-Aperture Components With Coarse-to-Fine Contour Extraction. IEEE Trans. Instrum. Meas. 2020, 69, 3262–3274. [Google Scholar] [CrossRef]
- Fontana, G.; Calabrese, M.; Agnusdei, L.; Papadia, G.; Del Prete, A. SolDef AI: An Open Source PCB Dataset for Mask R-CNN Defect Detection in Soldering Processes of Electronic Components. J. Manuf. Mater. Process. 2024, 8, 117. [Google Scholar]
- Xu, Z.; Zhao, X.; Wang, X.; Kong, Y.; Ren, T.; Wang, Y. Interpretability Analysis and Attention Mechanism of Deep Learning-Based Microscopic Vision. Preprints 2024. [Google Scholar] [CrossRef]
- Kang, M.; Ting, C.M.; Ting, F.F.; Phan, R.C.-W. ASF-YOLO: A novel YOLO model with attentional scale sequence fusion for cell instance segmentation. Image Vis. Comput. 2024, 147, 105057. [Google Scholar] [CrossRef]
- Yan, S.; Tao, X.; Xu, D. High-precision robotic assembly system using three-dimensional vision. Int. J. Adv. Robot. Syst. 2021, 18, 17298814211027029. [Google Scholar] [CrossRef]
- Liu, G.; Kang, S.; Li, T.; Xu, Y.; Zhou, J. Deep-Learning-Based Real-Time Visual Detection Method for Robotic Cell Microinjection System With High Accuracy and Efficiency. J. Robot. 2025, 2025, 8556780. [Google Scholar] [CrossRef]
- Liu, H.; Yu, H.; Ding, K.; Li, X. Two-Stage Visual Detection of Micro-Objects: Theory and Experiments. In Proceedings of the 2023 IEEE International Conference on Real-time Computing and Robotics (RCAR), Datong, China, 17–20 July 2023; pp. 859–863. [Google Scholar] [CrossRef]
- Yao, S.; Zhang, X.; Fatikow, S. Marker-assisted cross-scale measurement for robotic macro–micro manipulation utilizing computer microvision. Measurement 2024, 235, 114908. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, X.; Li, H.; Zhang, F. Development and calibration of a cross-scale multi-camera vision system for 3D micro-assembly. IEEE Sens. J. 2024, 25, 5394–5404. [Google Scholar] [CrossRef]
- Vismanis, O.; Arents, J.; Subačiūtė-Žemaitienė, J.; Bučinskas, V.; Dzedzickis, A.; Patel, B.; Tung, W.-C.; Lin, P.-T.; Greitans, M. A vision-based micro-manipulation system. Appl. Sci. 2023, 13, 13248. [Google Scholar] [CrossRef]
- Fan, Q.; Wu, Y.; Bi, K.; Liu, Y. Autonomous vision-guided two-arm collaborative microassembly using learned manipulation model. IEEE Robot. Autom. Lett. 2024, 9, 2375–2382. [Google Scholar] [CrossRef]
- Ming, J.; Bargmann, D.; Cao, H.; Caccamo, M. Flexible gear assembly with visual servoing and force feedback. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 8276–8282. [Google Scholar]
- Qin, F.; Xu, D.; Zhang, D.; Pei, W.; Han, X.; Yu, S. Automated hooking of biomedical microelectrode guided by intelligent microscopic vision. IEEE/ASME Trans. Mechatron. 2023, 28, 2786–2798. [Google Scholar] [CrossRef]
- Zhang, J.; Dai, X.; Wu, W.; Du, K. Micro-vision based high-precision space assembly approach for trans-scale micro-device: The CFTA example. Sensors 2023, 23, 450. [Google Scholar] [CrossRef]
- Cheng, J.; Wu, W.; Yang, Y.; Zhang, J. YOLACT in Micro-Assembly Robot System. In Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence (ACAI’21), Sanya, China, 22–24 December 2021; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Su, Q.; Li, H. YOLO-DST: MEMS Small-Object Defect Detection Method Based on Dynamic Channel–Spatial Modeling and Multi-Attention Fusion. Sensors 2026, 26, 369. [Google Scholar] [CrossRef]
- Markert, T.; Matich, S.; Neykov, D.; Muenig, M.; Theissler, A.; Atzmueller, M. Visual Detection of Tiny and Transparent Objects for Autonomous Robotic Pick-and-Place Operations. In Proceedings of the 2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA), Stuttgart, Germany, 6–9 September 2022; pp. 1–4. [Google Scholar] [CrossRef]
- Hao, Y.; Feng, J. An Industrial Micro Parts Recognition Technology Based on Improved Yolov8. J. Phys. Conf. Ser. 2024, 2872, 012012. [Google Scholar] [CrossRef]
- Guo, C.; Tan, F. SWRD–YOLO: A Lightweight Instance Segmentation Model for Estimating Rice Lodging Degree in UAV Remote Sensing Images with Real-Time Edge Deployment. Agriculture 2025, 15, 1570. [Google Scholar] [CrossRef]
- Yi, W.; Zhang, Z.; Chang, L. M4MLF-YOLO: A Lightweight Semantic Segmentation Framework for Spacecraft Component Recognition. Remote Sens. 2025, 17, 3144. [Google Scholar] [CrossRef]
- Li, Y.; Li, Q.; Pan, J.; Zhou, Y.; Zhu, H.; Wei, H.; Liu, C. SOD-YOLO: Small-Object-Detection Algorithm Based on Improved YOLOv8 for UAV Images. Remote Sens. 2024, 16, 3057. [Google Scholar] [CrossRef]
- Jegham, N.; Koh, C.Y.; Abdelatti, M.; Hendawi, A. Yolo evolution: A comprehensive benchmark and architectural review of yolov12, yolo11, and their previous versions. arXiv 2024, arXiv:2411.00201. [Google Scholar]
- Sunkara, R.; Luo, T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects; Amini, M.R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G., Eds.; Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2022; Volume 13715, pp. 443–459. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 13713–13722. [Google Scholar]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2022, arXiv:2110.13389. [Google Scholar] [CrossRef]
- Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]











| Hyperparameter | Value |
|---|---|
| Optimizer | SGD |
| Initial learning rate lr0 | 0.01 |
| Final learning rate lrf | 0.01 |
| Training epoch | 500 |
| Early stopping | 30 |
| Weight decay | 0.005 |
| Dropout | 0.2 |
| Mosaic | 1.0 |
| Mixup | 0.2 |
| Copy-paste | 0.5 |
| Model | CA | SPD-Conv | NWD-EIoU | Precision (%) | Recall (%) | mAP0.5 (%) | mAP0.5–0.95 (%) |
|---|---|---|---|---|---|---|---|
| Baseline | × | × | × | 57.6 | 56.4 | 58.2 | 29.6 |
| A | √ | × | × | 63.2 | 57.9 | 60.2 | 31.3 |
| B | × | √ | × | 58.5 | 53.6 | 55.8 | 29.0 |
| C | √ | √ | × | 66.8 | 56.9 | 59.1 | 32.1 |
| D (Ours) | √ | √ | √ | 73.7 | 60.4 | 68.1 | 35.4 |
| Model | Precision (%) | Recall (%) | mAP0.5 (%) | mAP0.5–0.95 (%) |
|---|---|---|---|---|
| RT-DETR-ResNet50 | 59.9 | 54.6 | 54.4 | 27.8 |
| YOLOv5s-seg | 55.7 | 47.5 | 45.7 | 22.6 |
| YOLOv8s-seg | 64.7 | 52.2 | 54.3 | 28.5 |
| YOLOv9-seg | 47.2 | 46.5 | 45.4 | 24.4 |
| YOLO11m-seg | 57.6 | 56.4 | 58.2 | 29.6 |
| YOLO12m-seg | 47.5 | 50.6 | 49.0 | 23.8 |
| SO-YOLO11 | 73.7 | 60.4 | 68.1 | 35.4 |
| Horizontal Vision | Vertical Vision | |
|---|---|---|
| Image Jacobian matrix (active movement method) | ||
| Image offset matrix (100 μm pitch) | ||
| Cross-depth-of-field image Jacobian matrix (error compensation method) |
| Horizontal Vision | Vertical Vision | |
|---|---|---|
| Image Offset matrix (50 μm step) | ||
| Image Offset matrix (150 μm step) | ||
| Cross-depth-of-field image Jacobian matrix (50 μm step) | ||
| Cross-depth-of-field image Jacobian matrix (150 μm step) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lu, X.; Zhang, J.; Yang, Y.; Bi, L. SO-YOLO11-CDP: An Instance Segmentation-Based Approach for Cross-Depth-of-Field Positioning Micro Image Sensor Modules in Precision Assembly. Electronics 2026, 15, 411. https://doi.org/10.3390/electronics15020411
Lu X, Zhang J, Yang Y, Bi L. SO-YOLO11-CDP: An Instance Segmentation-Based Approach for Cross-Depth-of-Field Positioning Micro Image Sensor Modules in Precision Assembly. Electronics. 2026; 15(2):411. https://doi.org/10.3390/electronics15020411
Chicago/Turabian StyleLu, Xi, Juan Zhang, Yi Yang, and Lie Bi. 2026. "SO-YOLO11-CDP: An Instance Segmentation-Based Approach for Cross-Depth-of-Field Positioning Micro Image Sensor Modules in Precision Assembly" Electronics 15, no. 2: 411. https://doi.org/10.3390/electronics15020411
APA StyleLu, X., Zhang, J., Yang, Y., & Bi, L. (2026). SO-YOLO11-CDP: An Instance Segmentation-Based Approach for Cross-Depth-of-Field Positioning Micro Image Sensor Modules in Precision Assembly. Electronics, 15(2), 411. https://doi.org/10.3390/electronics15020411
