Toward Realistic Autonomous Driving Dataset Augmentation: A Real–Virtual Fusion Approach with Inconsistency Mitigation
Abstract
1. Introduction
- We propose an efficient and novel real–virtual fusion framework that leverages real-world driving data from K-City and a digital twin in Morai Sim for autonomous driving dataset augmentation.
- We develop a robust real–virtual synchronization method, combining a look-up table for initial alignment with a fine-tuned localization algorithm, to accurately map real camera poses to the virtual environment.
- We implement an advanced virtual object injection pipeline focusing on inconsistency mitigation, incorporating illumination matching and color transfer to enhance the photorealism of augmented images.
- We demonstrate the effectiveness of our framework through two comprehensive experiments: quantitatively assessing the perceptual realism of the generated augmented images using a pre-trained YOLOv7 model, and evaluating the improved object recognition performance and robustness of a fine-tuned YOLOv7 model on a real-world dataset (BDD100k [30]).
2. Related Works
2.1. Autonomous Driving Datasets and Perception
2.2. Autonomous Driving Simulation and Digital Twins
2.3. Data Augmentation and Reality Gap Mitigation
- Domain Adaptation: Techniques such as unsupervised domain adaptation reduce the discrepancy between source (synthetic) and target (real) domains. This enables models to perform well on real data without requiring extensive real-world annotations [10].
- Rendering-Based Augmentation: Injecting synthetic objects into real-world scenes is a powerful data augmentation strategy. For example, LiDAR-Aug [33] composites virtual objects into real LiDAR scans for 3D detection. While effective for geometric data, this approach does not address the photometric inconsistencies inherent in camera sensors. Similar principles apply to camera data, but maintaining visual consistency is critical. Simple cut-and-paste methods often introduce artifacts due to mismatched lighting, shadows, or occlusions. Color transfer techniques, such as those by Reinhard et al. [29], provide a baseline for matching visual properties between different images. However, robust augmentation requires addressing both geometric and photometric inconsistencies to achieve seamless real–virtual fusion.
3. Proposed Methodology
3.1. Real–Virtual Synchronization
3.1.1. Initial Alignment via Look-Up Table
3.1.2. Fine-Tuned Localization Using Visual–Inertial Odometry
3.1.3. Drift Correction and Global Consistency
3.2. Virtual Object Injection and Augmented Image Generation
3.2.1. Virtual Object Placement and Scene Composition
3.2.2. Illumination Estimation and Color Transfer
3.3. Automated Ground Truth Labeling
4. Experiments
4.1. Experimental Setup
4.1.1. Datasets
- Real-World Data (K-City): We collected several hours of driving data from the K-City testbed in Hwaseong, South Korea. This data includes high-resolution camera images (1920 × 1080, 30 fps) and time-synchronized IMU measurements, which are crucial for our real-virtual synchronization process described in Section 3.1.
- Virtual Environment (Morai Sim): A high-fidelity digital twin of the K-City environment was utilized, built within the Morai Sim simulator. This virtual environment provides virtual sensor data corresponding to real-world coordinates.
- Generated Datasets: We generated two distinct datasets for our experiments, both based on the 7 target classes (Person, Bicycle, Car, Motorcycle, Bus, Truck, and Traffic Light).
- : A large-scale training dataset of 4229 augmented images. This set was created by injecting the 7 classes of virtual objects into the real K-City backgrounds, with approximately 500 images per class, covering various challenging scenarios (e.g., varying distances, angles, and potential near-collision trajectories). This dataset was generated in two versions: one with our full pipeline including color transfer, and one without, for ablation studies.
- : A test dataset of 553 augmented images (approx. 80 images per class). This set was created for Experiment 1.
- Real-World Test Set (): We utilize the official BDD100k [30] object detection validation split for Experiment 2. This large-scale dataset, comprising 10,000 images, provides diverse, real-world driving scenarios and includes annotations for all 7 of our target classes, enabling a robust evaluation of real-world generalization performance.
4.1.2. Implementation Details
4.1.3. Evaluation Metrics
- Precision (P): The accuracy of positive predictions. It is defined as:where is True Positives and is False Positives.
- Recall (R): The ability of the model to find all relevant ground truths. It is defined as:where is False Negatives.
- F1-Score: The harmonic mean of Precision and Recall, providing a single score that balances both metrics.
- mAP@0.5: The mean Average Precision (mAP) calculated at a fixed Intersection over Union (IoU) threshold of 0.5. AP for a single class is the area under the Precision-Recall curve:mAP is the average of AP scores across all C classes (in our case, ).
4.2. Experiments and Results
4.2.1. Experiment 1: Perceptual Realism Test
4.2.2. Experiment 2: Performance Improvement Test
4.3. Qualitative Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yu, B.; Yuan, C.; Wan, Z.; Tang, J.; Kurdahi, F.; Liu, S. ADDT—A Digital Twin Framework for Proactive Safety Validation in Autonomous Driving Systems. arXiv 2025, arXiv:2504.09461. [Google Scholar]
- Jung, S.; Cho, Y.; Kim, D.; Chang, M. Moving object detection from moving camera image sequences using an inertial measurement unit sensor. Appl. Sci. 2019, 10, 268. [Google Scholar] [CrossRef]
- Jung, S.; Cho, Y.; Lee, K.; Chang, M. Moving object detection with single moving camera and IMU sensor using mask R-CNN instance image segmentation. Int. J. Precis. Eng. Manuf. 2021, 22, 1049–1059. [Google Scholar] [CrossRef]
- Jung, S.; Park, S.; Lee, K. Pose Tracking of Moving Sensor using Monocular Camera and IMU Sensor. KSII Trans. Internet Inf. Syst. 2021, 15, 3011–3302. [Google Scholar] [CrossRef]
- Wang, Z.; Huang, X.; Hu, Z. Attention-Based LiDAR–Camera Fusion for 3D Object Detection in Autonomous Driving. World Electr. Veh. J. 2025, 16, 306. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8, 2023. Available online: https://docs.ultralytics.com/models/yolov8/ (accessed on 23 December 2025).
- Jocher, G.; Qiu, J. Ultralytics YOLO11, 2024. Available online: https://docs.ultralytics.com/models/yolo11/ (accessed on 23 December 2025).
- Wang, A.; Liu, L.; Chen, H.; Lin, Z.; Han, J.; Ding, G. YOLOE: Real-Time Seeing Anything. arXiv 2025, arXiv:2503.07465. [Google Scholar] [CrossRef]
- Wang, Y.; Zhao, N.; Lee, G.H. Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection. In Proceedings of the 35th British Machine Vision Conference (BMVC), Glasgow, UK, 25–28 November 2024. [Google Scholar]
- Kim, J.; Lee, J.; Han, G.; Lee, D.; Jeong, M.; Kim, J. SynAD: Enhancing Real-World End-to-End Autonomous Driving Models through Synthetic Data Integration. arXiv 2025, arXiv:2510.24052. [Google Scholar]
- Goedicke, D.; Bremers, A.W.; Lee, S.; Bu, F.; Yasuda, H.; Ju, W. XR-OOM: MiXed Reality driving simulation with real cars for research and design. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 30 April–5 May 2022. [Google Scholar] [CrossRef]
- Reway, F.; Huber, W.; Ribeiro, E.P. Test Methodology for Vision-Based ADAS Algorithms with an Automotive Camera-in-the-Loop. In Proceedings of the 2018 IEEE International Conference on Vehicular Electronics and Safety (ICVES), Madrid, Spain, 1–12 September 2018; pp. 1–7. [Google Scholar] [CrossRef]
- Perez-Segui, R.; Arias-Perez, P.; Melero-Deza, J.; Fernandez-Cortizas, M.; Perez-Saura, D.; Campoy, P. Bridging the Gap between Simulation and Real Autonomous UAV Flights in Industrial Applications. Aerospace 2023, 10, 814. [Google Scholar] [CrossRef]
- Cao, X.; Chen, H.; Gelbal, S.Y.; Aksun-Guvenc, B.; Guvenc, L. Vehicle-in-Virtual-Environment (VVE) Method for Autonomous Driving System Development, Evaluation and Demonstration. Sensors 2023, 23, 5088. [Google Scholar] [CrossRef]
- Han, J.; Liu, K.; Li, W.; Zhang, F.; Xia, X.G. Generating Inverse Feature Space for Class Imbalance in Point Cloud Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 5778–5793. [Google Scholar] [CrossRef]
- Jing, G.; Wang, W.T.; Zeng, Y.C.; Chen, Z.Y. Improved Semantic Segmentation With Large-Scale Attention-Based Self-Supervised Few-Shot Learning. IEEE Trans. Consum. Electron. 2025, 71, 9188–9192. [Google Scholar] [CrossRef]
- He, Y.; Chen, H.; Shi, W. An Advanced Framework for Ultra-Realistic Simulation and Digital Twinning for Autonomous Vehicles. arXiv 2024, arXiv:2405.01328. [Google Scholar] [CrossRef]
- Barabás, I.; Iclodean, C.; Beles, H.; Antonya, C.; Molea, A.; Scurt, F.B. An Approach to Modeling and Developing Virtual Sensors Used in the Simulation of Autonomous Vehicles. Sensors 2025, 25, 3338. [Google Scholar] [CrossRef] [PubMed]
- Liang, Z.; Wang, J.; Zhang, T.; Yong, X. DTTF-Sim: A Digital Twin-Based Simulation System for Continuous Autonomous Driving Testing. Sensors 2025, 25, 3447. [Google Scholar] [CrossRef]
- Zhang, T.; Liu, H.; Wang, W.; Wang, X. Virtual Tools for Testing Autonomous Driving: A Survey and Benchmark of Simulators, Datasets, and Competitions. Electronics 2024, 13, 3486. [Google Scholar] [CrossRef]
- Xu, M.; Niyato, D.; Chen, J.; Zhang, H.; Kang, J.; Xiong, Z.; Mao, S.; Han, Z. Generative AI-empowered Simulation for Autonomous Driving in Vehicular Mixed Reality Metaverses. arXiv 2023, arXiv:2302.08418. [Google Scholar] [CrossRef]
- Niemeyer, M.; Geiger, A. GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields. arXiv 2021, arXiv:2011.12100. [Google Scholar] [CrossRef]
- Li, L.; Lian, Q.; Wang, L.; Ma, N.; Chen, Y.C. Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field. arXiv 2023, arXiv:2304.03526. [Google Scholar] [CrossRef]
- Imane, A.; Guériau, M.; Ainouz, S. Towards a Mixed-Reality framework for autonomous driving. In Proceedings of the IEEE International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 18 September–12 October 2022. [Google Scholar]
- Baruffa, R.; Pereira, J.; Romet, P.; Gechter, F.; Weiss, T. Mixed Reality Autonomous Vehicle Simulation: Implementation of a Hardware-In-the-Loop Architecture at a Miniature Scale. In Proceedings of the the IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NA, USA, 19 October–13 November 2020. [Google Scholar]
- Nam, S.M.; Park, J.; Sagong, C.; Lee, Y.; Kim, H.J. A Vehicle Crash Simulator Using Digital Twin Technology for Synthesizing Simulation and Graphical Models. Vehicles 2023, 5, 1046–1059. [Google Scholar] [CrossRef]
- MORAI. MORAI, Autonomous Vehicle Driving Simulator. 2023. Available online: https://www.morai.ai/ (accessed on 23 June 2025).
- Reinhard, E.; Adhikhmin, M.; Gooch, B.; Shirley, P. Color transfer between images. IEEE Comput. Graph. Appl. 2001, 21, 34–41. [Google Scholar] [CrossRef]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. arXiv 2020, arXiv:1805.04687. [Google Scholar] [CrossRef]
- Liu, H.; Wu, C.; Wang, H. Real time object detection using LiDAR and camera fusion for autonomous driving. Sci. Rep. 2023, 13, 8056. [Google Scholar] [CrossRef]
- Jung, S.; Lee, Y.S.; Lee, Y.; Lee, K. 3D reconstruction using 3D registration-based ToF-stereo fusion. Sensors 2022, 22, 8369. [Google Scholar] [CrossRef]
- Fang, J.; Zuo, X.; Zhou, D.; Jin, S.; Wang, S.; Zhang, L. LiDAR-Aug: A General Rendering-based Augmentation Framework for 3D Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 4708–4718. [Google Scholar] [CrossRef]
- Jung, S.; Yun, H.; Lee, K. Accurate Pose Estimation Method using ToF-Stereo Camera Data Fusion. In Proceedings of the 2023 14th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 11–13 October 2023; pp. 644–645. [Google Scholar]




| Class | mAP@0.5 | P | R | F1-Score |
|---|---|---|---|---|
| Person | 78.5% | 85.2% | 75.1% | 0.798 |
| Car | 91.2% | 93.5% | 90.1% | 0.918 |
| Bicycle | 75.8% | 81.0% | 72.5% | 0.765 |
| Truck | 83.1% | 87.6% | 80.0% | 0.836 |
| Bus | 85.5% | 89.1% | 83.0% | 0.859 |
| Motorcycle | 71.9% | 77.2% | 68.5% | 0.726 |
| Traffic Light | 79.0% | 84.5% | 75.5% | 0.798 |
| Overall mAP@0.5 | 80.7% | - | - | - |
| Class | Detected Count | Avg. Confidence |
|---|---|---|
| Person | 227 | 0.85 |
| Bicycle | 89 | 0.66 |
| Car | 498 | 0.56 |
| Motorcycle | 150 | 0.75 |
| Bus | 87 | 0.84 |
| Truck | 83 | 0.83 |
| Traffic Light | 373 | 0.54 |
| Average | - | 0.72 |
| Class | YOLOv7 | YOLOv8 | YOLOv11 | Avg. Gain | |||
|---|---|---|---|---|---|---|---|
| Base | Ours | Base | Ours | Base | Ours | ||
| Person | 59.3 | 63.1 | 51.2 | 57.4 | 58.0 | 64.7 | +5.6 |
| Bicycle | 38.0 | 50.8 | 33.7 | 51.1 | 40.4 | 49.8 | +13.2 |
| Car | 66.2 | 69.5 | 56.2 | 70.5 | 66.2 | 70.7 | +7.4 |
| Motorcycle | 37.3 | 51.9 | 38.6 | 55.9 | 38.5 | 52.2 | +15.2 |
| Bus | 47.0 | 58.4 | 49.3 | 60.1 | 46.8 | 51.6 | +9.0 |
| Truck | 40.1 | 51.2 | 54.6 | 59.2 | 40.7 | 52.9 | +9.3 |
| Traffic Light | 32.2 | 39.9 | 26.6 | 48.9 | 28.9 | 49.6 | +16.9 |
| Overall mAP | 45.7 | 56.4 | 44.3 | 57.5 | 45.6 | 55.9 | +11.4 |
| Class | YOLOv7_aug | YOLOv7_ours | mAP (Ours—Aug) | ||||
|---|---|---|---|---|---|---|---|
| mAP | P | R | mAP | P | R | ||
| Person | 60.5% | 74.0% | 54.2% | 63.1% | 78.5% | 56.8% | +2.6% |
| Bicycle | 44.2% | 66.8% | 40.5% | 50.8% | 72.4% | 48.1% | +6.6% |
| Car | 67.8% | 80.5% | 59.1% | 69.5% | 83.2% | 62.4% | +1.7% |
| Motorcycle | 45.6% | 56.2% | 48.3% | 51.9% | 65.7% | 55.2% | +6.3% |
| Bus | 52.1% | 48.5% | 55.4% | 58.4% | 62.0% | 61.5% | +6.3% |
| Truck | 45.3% | 35.4% | 60.2% | 51.2% | 48.6% | 65.3% | +5.9% |
| Traffic Light | 35.8% | 58.2% | 34.5% | 39.9% | 64.1% | 38.2% | +4.1% |
| Overall | 50.5% | - | - | 56.4% | - | - | +5.9% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jung, S.; Kim, M.; Oh, J.; Kim, J.; Lee, K.-T. Toward Realistic Autonomous Driving Dataset Augmentation: A Real–Virtual Fusion Approach with Inconsistency Mitigation. Sensors 2026, 26, 987. https://doi.org/10.3390/s26030987
Jung S, Kim M, Oh J, Kim J, Lee K-T. Toward Realistic Autonomous Driving Dataset Augmentation: A Real–Virtual Fusion Approach with Inconsistency Mitigation. Sensors. 2026; 26(3):987. https://doi.org/10.3390/s26030987
Chicago/Turabian StyleJung, Sukwoo, Myeongseop Kim, Jean Oh, Jonghwa Kim, and Kyung-Taek Lee. 2026. "Toward Realistic Autonomous Driving Dataset Augmentation: A Real–Virtual Fusion Approach with Inconsistency Mitigation" Sensors 26, no. 3: 987. https://doi.org/10.3390/s26030987
APA StyleJung, S., Kim, M., Oh, J., Kim, J., & Lee, K.-T. (2026). Toward Realistic Autonomous Driving Dataset Augmentation: A Real–Virtual Fusion Approach with Inconsistency Mitigation. Sensors, 26(3), 987. https://doi.org/10.3390/s26030987

