Generative Data Augmentation for ArUco-Free RGB-Based 6-DoF Object Pose Estimation

Scribano, Carmelo; Ferrari, Iacopo; Franchini, Giorgia; Govi, Elena; Sapienza, Davide; Poppi, Tobia; Verucchi, Micaela; Bertogna, Marko

doi:10.3390/jimaging12060244

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Generative Data Augmentation for ArUco-Free RGB-Based 6-DoF Object Pose Estimation

by

Carmelo Scribano

¹

,

Iacopo Ferrari

¹,

Giorgia Franchini

^1,*

,

Elena Govi

²

,

Davide Sapienza

¹

,

Tobia Poppi

¹

,

Micaela Verucchi

²

and

Marko Bertogna

¹

Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Via Campi 213/B , 41125 Modena, Italy

²

HIPERT s.r.l, Via Inventori, 37, 41121 Modena, Italy

^*

Author to whom correspondence should be addressed.

J. Imaging 2026, 12(6), 244; https://doi.org/10.3390/jimaging12060244

Submission received: 23 March 2026 / Revised: 21 May 2026 / Accepted: 27 May 2026 / Published: 29 May 2026

(This article belongs to the Special Issue AI-Driven Image and Video Understanding)

Download Review Reports Versions Notes

Abstract

In recent years, data-driven approaches have become increasingly important in industrial computer vision applications, particularly for 6-Degrees-of-Freedom (6-DoF) object pose estimation. However, benchmark datasets may unintentionally introduce biases that affect the reliability of learned models. In this work, we investigate the shortcut bias induced by fiducial ArUco markers in the widely used Linemod dataset. Although such markers are typically absent in real industrial environments, they introduce unintended visual cues that neural networks tend to exploit. As a result, model selection based on state-of-the-art benchmarks can be biased, since the reported performance often reflects reliance on these shortcuts rather than on robust feature extraction. Using saliency map analysis, we show that often a large portion of the model’s attention is concentrated on these markers, revealing the presence of a shortcut that artificially boosts pose estimation performance. To mitigate this issue, we propose a data augmentation pipeline based on generative AI techniques that removes the markers and replaces the background with more realistic synthesized scenes. Experimental results indicate a noticeable drop in performance when models trained on the original Linemod dataset are evaluated in ArUco-free environments, confirming the presence of background-induced biases. Training with the proposed generative-swapped dataset leads to improved robustness and better generalization to unseen scenarios, although it does not fully eliminate the problem. Overall, the results highlight the impact of background-related biases in pose estimation benchmarks and suggest that the proposed augmentation strategy represents a practical and scalable step toward developing more reliable 6-DoF pose estimation systems for industrial applications, while leaving room for further improvements.

Keywords: data manipulation; 6d pose estimation; generative data augmentation; explainable artificial intelligence; saliency methods

Share and Cite

MDPI and ACS Style

Scribano, C.; Ferrari, I.; Franchini, G.; Govi, E.; Sapienza, D.; Poppi, T.; Verucchi, M.; Bertogna, M. Generative Data Augmentation for ArUco-Free RGB-Based 6-DoF Object Pose Estimation. J. Imaging 2026, 12, 244. https://doi.org/10.3390/jimaging12060244

AMA Style

Scribano C, Ferrari I, Franchini G, Govi E, Sapienza D, Poppi T, Verucchi M, Bertogna M. Generative Data Augmentation for ArUco-Free RGB-Based 6-DoF Object Pose Estimation. Journal of Imaging. 2026; 12(6):244. https://doi.org/10.3390/jimaging12060244

Chicago/Turabian Style

Scribano, Carmelo, Iacopo Ferrari, Giorgia Franchini, Elena Govi, Davide Sapienza, Tobia Poppi, Micaela Verucchi, and Marko Bertogna. 2026. "Generative Data Augmentation for ArUco-Free RGB-Based 6-DoF Object Pose Estimation" Journal of Imaging 12, no. 6: 244. https://doi.org/10.3390/jimaging12060244

APA Style

Scribano, C., Ferrari, I., Franchini, G., Govi, E., Sapienza, D., Poppi, T., Verucchi, M., & Bertogna, M. (2026). Generative Data Augmentation for ArUco-Free RGB-Based 6-DoF Object Pose Estimation. Journal of Imaging, 12(6), 244. https://doi.org/10.3390/jimaging12060244

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generative Data Augmentation for ArUco-Free RGB-Based 6-DoF Object Pose Estimation

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI