Clustering Visual Similar Objects for Enhanced Synthetic Image Data for Object Detection
Abstract
1. Introduction
2. Related Work
2.1. Industrial Object Detection
2.2. Synthetic Data for Object Detection
2.3. Object Similarity Analysis
3. Materials and Methods
3.1. Similarity Analysis
3.1.1. Pre-Processing
3.1.2. Clustering
3.1.3. Post-Processing
3.2. Fine-Tuning
3.3. Experimental Procedure
4. Results
4.1. Clustering Without Fine Tuning
4.2. Clustering with Fine-Tuning
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hussain, M. YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
- Yun, H.; Kim, E.; Kim, D.M.; Park, H.W.; Jun, M.B.G. Machine Learning for Object Recognition in Manufacturing Applications. Int. J. Precis. Eng. Manuf. 2023, 24, 683–712. [Google Scholar] [CrossRef]
- Ahmad, H.M.; Rahimi, A. Deep learning methods for object detection in smart manufacturing: A survey. J. Manuf. Syst. 2022, 64, 181–196. [Google Scholar] [CrossRef]
- Riedel, A.; Gerlach, J.; Dietsch, M.; Herbst, S.; Engelmann, F.; Brehm, N.; Pfeifroth, T. A deep learning-based worker assistance system for error prevention: Case study in a real-world manual assembly. Adv. Prod. Eng. Manag. 2021, 16, 393–404. [Google Scholar] [CrossRef]
- Cheng, T.; Song, L.; Ge, Y.; Liu, W.; Wang, X.; Shan, Y. YOLO-World: Real-Time Open-Vocabulary Object Detection. arXiv 2024, arXiv:2401.17270. [Google Scholar]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Wan-Yen, L.; et al. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023. [Google Scholar]
- Baumgart, N.; Lange-Hegermann, M.; Mücke, M. Investigation of the Impact of Synthetic Training Data in the Industrial Application of Terminal Strip Object Detection. arXiv 2024, arXiv:2403.04809. [Google Scholar]
- Trentsios, P.; Wolf, M.; Gerhard, D. Overcoming the Sim-to-Real Gap in Autonomous Robots. Procedia CIRP 2022, 109, 287–292. [Google Scholar] [CrossRef]
- Mangold, S.; Steiner, C.; Friedmann, M.; Fleischer, J. Vision-Based Screw Head Detection for Automated Disassembly for Remanufacturing. Procedia CIRP 2022, 105, 1–6. [Google Scholar] [CrossRef]
- Brogan, D.P.; DiFilippo, N.M.; Jouaneh, M.K. Deep learning computer vision for robotic disassembly and servicing applications. Array 2021, 12, 100094. [Google Scholar] [CrossRef]
- Yildiz, E.; Brinker, T.; Renaudo, E.; Hollenstein, J.; Haller-Seeber, S.; Piater, J.; Wörgötter, F. A Visual Intelligence Scheme for Hard Drive Disassembly in Automated Recycling Routines. In Proceedings of the International Conference on Robotics, Computer Vision and Intelligent Systems, Online, 4–6 November 2020; SCITEPRESS—Science and Technology Publications. pp. 17–27. [Google Scholar] [CrossRef]
- Basamakis, F.P.; Bavelos, A.C.; Dimosthenopoulos, D.; Papavasileiou, A.; Makris, S. Deep object detection framework for automated quality inspection in assembly operations. Procedia CIRP 2022, 115, 166–171. [Google Scholar] [CrossRef]
- Kuo, R.J.; Nursyahid, F.F. Foreign objects detection using deep learning techniques for graphic card assembly line. J. Intell. Manuf. 2023, 34, 2989–3000. [Google Scholar] [CrossRef] [PubMed]
- Židek, K.; Lazorík, P.; Piteľ, J.; Pavlenko, I.; Hošovský, A. Automated Training of Convolutional Networks by Virtual 3D Models for Parts Recognition in Assembly Process. In Advances in Manufacturing II; Trojanowska, J., Ciszak, O., Machado, J.M., Pavlenko, I., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 287–297. [Google Scholar]
- Tao, W.; Lai, Z.H.; Leu, M.C.; Yin, Z.; Qin, R. A self-aware and active-guiding training & assistant system for worker-centered intelligent manufacturing. Manuf. Lett. 2019, 21, 45–49. [Google Scholar] [CrossRef]
- Lai, Z.H.; Tao, W.; Leu, M.C.; Yin, Z. Smart augmented reality instructional system for mechanical assembly towards worker-centered intelligent manufacturing. J. Manuf. Syst. 2020, 55, 69–81. [Google Scholar] [CrossRef]
- Greff, K.; Belletti, F.; Beyer, L.; Doersch, C.; Du, Y.; Duckworth, D.; Fleet, D.J.; Gnanapragasam, D.; Golemo, F.; Herrmann, C.; et al. Kubric: A scalable dataset generator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Rolf, J.; Wolf, M.; Gerhard, D. Investigation of an Integrated Synthetic Dataset Generation Workflow for Computer Vision Applications. In Product Lifecycle Management. Leveraging Digital Twins, Circular Economy, and Knowledge Management for Sustainable Innovation; IFIP Advances in Information and Communication Technology; Danjou, C., Harik, R., Nyffenegger, F., Rivest, L., Bouras, A., Eds.; Springer Nature: Berlin/Heidelberg, Germany, 2024; Volume 702, pp. 187–196. [Google Scholar] [CrossRef]
- Ohbuchi, R.; Nakazawa, M.; Takei, T. Retrieving 3D shapes based on their appearance. In Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval—MIR ’03, Berkeley, CA, USA, 7 November 2003; Sebe, N., Lew, M.S., Djeraba, C., Eds.; Association for Computing Machinery: New York, NY, USA, 2003; p. 39. [Google Scholar] [CrossRef]
- Kaku, K.; Okada, Y.; Niijima, K. Similarity measure based on OBBTree for 3D model search. In Proceedings of the Proceedings. International Conference on Computer Graphics, Imaging and Visualization, Penang, Malaysia, 26–29 July 2004; pp. 46–51. [Google Scholar] [CrossRef]
- Zehtaban, L.; Elazhary, O.; Roller, D. A framework for similarity recognition of CAD models. J. Comput. Des. Eng. 2016, 3, 274–285. [Google Scholar] [CrossRef]
- Ma, Y.; Xu, G.; Sun, X.; Yan, M.; Zhang, J.; Ji, R. X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval. In MM ’22: Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; Magalhães, J., Del Bimbo, A., Satoh, S., Sebe, N.T., Alameda-Pineda, X., Jin, Q., Oria, V., Toni, L., Eds.; Association for Computing Machinery: New York, NY, USA, 2022; pp. 638–647. [Google Scholar] [CrossRef]
- Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; et al. DINOv2: Learning Robust Visual Features without Supervision. arXiv 2024, arXiv:2304.07193. [Google Scholar] [CrossRef]
- Nguyen, V.N.; Groueix, T.; Ponimatkin, G.; Lepetit, V.; Hodan, T. CNOS: A Strong Baseline for CAD-based Novel Object Segmentation. arXiv 2023, arXiv:2307.11067. [Google Scholar] [CrossRef]
- Li, X.; Wen, C.; Hu, Y.; Zhou, N. RS-CLIP: Zero shot remote sensing scene classification via contrastive vision-language supervision. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103497. [Google Scholar] [CrossRef]
- Xie, J.; Girshick, R.; Farhadi, A. Unsupervised Deep Embedding for Clustering Analysis. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
- Arutiunian, A.; Vidhani, D.; Venkatesh, G.; Bhaskar, M.; Ghosh, R.; Pal, S. Fine Tuning CLIP with Remote Sensing (Satellite) Images and Captions. 2021. Available online: https://huggingface.co/blog/fine-tune-clip-rsicd (accessed on 22 August 2024).
- Becht, E.; McInnes, L.; Healy, J.; Dutertre, C.A.; Kwok, I.W.H.; Ng, L.G.; Ginhoux, F.; Newell, E.W. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 2018, 37, 38–44. [Google Scholar] [CrossRef]
- van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021. [Google Scholar]
- Chechik, G.; Sharma, V.; Shalit, U.; Bengio, S. Large Scale Online Learning of Image Similarity Through Ranking. J. Mach. Learn. Res. 2010, 11, 1109–1135. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. arXiv 2020, arXiv:2005.12872. [Google Scholar] [CrossRef]
- Dekhtiar, J.; Durupt, A.; Bricogne, M.; Eynard, B.; Rowson, H.; Kiritsis, D. Deep learning for big data applications in CAD and PLM—Research review, opportunities and case study. Comput. Ind. 2018, 100, 227–243. [Google Scholar] [CrossRef]
- Tresson, P.; Carval, D.; Tixier, P.; Puech, W. Hierarchical Classification of Very Small Objects: Application to the Detection of Arthropod Species. IEEE Access 2021, 9, 63925–63932. [Google Scholar] [CrossRef]
- Gupta, A.; Kalhagen, E.S.; Olsen, Ø.L.; Goodwin, M. Hierarchical Object Detection applied to Fish Species. Nord. Mach. Intell. 2022, 2, 1–15. [Google Scholar] [CrossRef]
- Zwemer, M.H.; Wijnhoven, R.G.J.; de With, P.H.N. Hierarchical Object Detection and Classification Using SSD Multi-Loss. In Computer Vision, Imaging and Computer Graphics Theory and Applications; Communications in Computer and Information Science; Bouatouch, K., de Sousa, A.A., Chessa, M., Paljic, A., Kerren, A., Hurter, C., Farinella, G.M., Radeva, P., Braz, J., Eds.; Springer International Publishing and Imprint Springer: Berlin/Heidelberg, Germany, 2022; Volume 1474, pp. 268–296. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Koch, S.; Matveev, A.; Jiang, Z.; Williams, F.; Artemov, A.; Burnaev, E.; Alexa, M.; Zorin, D.; Panozzo, D. ABC: A Big CAD Model Dataset For Geometric Deep Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Contr. Model | avg. loss | avg. loss impr. | min. loss | min. loss impr. |
---|---|---|---|---|
- | 0.944 | - | 0.546 | - |
DINOv2 | 0.735 | 22.15% | 0.426 | 22.03% |
Finetuned DINOv2 | 0.729 | 22.82% | 0.437 | 20.44% |
Contr. Model | avg. loss | avg. loss impr. | min. loss | min. loss impr. |
---|---|---|---|---|
- | 0.377 | - | 0.123 | - |
DINOv2 | 0.285 | 24.54% | 0.108 | 12.95% |
Finetuned DINOv2 | 0.332 | 11.94% | 0.090 | 26.93% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rolf, J.; Gerhard, D.; Kosic, P. Clustering Visual Similar Objects for Enhanced Synthetic Image Data for Object Detection. Information 2024, 15, 761. https://doi.org/10.3390/info15120761
Rolf J, Gerhard D, Kosic P. Clustering Visual Similar Objects for Enhanced Synthetic Image Data for Object Detection. Information. 2024; 15(12):761. https://doi.org/10.3390/info15120761
Chicago/Turabian StyleRolf, Julian, Detlef Gerhard, and Pero Kosic. 2024. "Clustering Visual Similar Objects for Enhanced Synthetic Image Data for Object Detection" Information 15, no. 12: 761. https://doi.org/10.3390/info15120761
APA StyleRolf, J., Gerhard, D., & Kosic, P. (2024). Clustering Visual Similar Objects for Enhanced Synthetic Image Data for Object Detection. Information, 15(12), 761. https://doi.org/10.3390/info15120761