A Comparative Study on Self-Driving Scenario Code Generation Through Prompt Engineering Based on LLM-Specific Characteristics

Haneul Yang; Hyoeun Kim; Jonggu Kang

doi:10.3390/app152312502

,

and

School of AI Convergence, Sungshin Women’s University, Seoul 02844, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci.2025, 15(23), 12502;https://doi.org/10.3390/app152312502

This article belongs to the Special Issue Artificial Intelligence for Advancing Connected and Autonomous Vehicles

Version Notes

Order Reprints

Abstract

Large Language Models (LLMs) demonstrate potential in code generation capabilities, yet their applicability in autonomous vehicle control has not been sufficiently explored. This study verifies whether LLMs can generate executable MATLAB code for software-defined vehicle scenarios, comparing five models: GPT-4, Gemini 2.5 Pro, Claude Sonnet 4.0, CodeLlama-13B-Instruct, and StarCoder2. Thirteen standardised prompts were applied across three types of scenarios: programming-based driving scenarios, inertial sensor-based simulations, and vehicle parking scenarios. Multiple automated evaluation metrics—BLEU, ROUGE-L, ChrF, Spec-Compliance, and Runtime-Sanity—were used to assess code executability, accuracy, and completeness. The results showed GPT-4 achieved the highest score 0.54 in the parking scenario with an overall average score of 0.27, followed by Gemini 2.5 Pro as 0.26. Commercial models demonstrated over 60% execution success rates across all scenarios, whereas open-source models like CodeLlama and StarCoder2 were limited to under 20%. Furthermore, the parking scenario yielded the lowest average score of 0.19, confirming that complex tasks involving sensor synchronisation and trajectory control represent a common limitation across all models. This study presents a new benchmark for quantitatively evaluating the quality of SDV control code generated by LLMs, empirically demonstrating that prompt design and task complexity critically influence model reliability and real-world applicability.

Keywords:

large language model; prompt engineering; MATLAB; software-defined vehicle; code generation; evaluation

A Comparative Study on Self-Driving Scenario Code Generation Through Prompt Engineering Based on LLM-Specific Characteristics

Abstract

Article Metrics

Citations

Article Access Statistics