Evaluating the Performance of Large Language Models for Geometry and Simulation File Generation in Physics-Based Simulations

Ossama Shafiq; Amin Rahmat; Alessio Alexiadis; Bahman Ghiassi

doi:10.3390/app152212114

,

and

¹

School of Engineering, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK

²

School of Chemical Engineering, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK

^*

Author to whom correspondence should be addressed.

Appl. Sci.2025, 15(22), 12114;https://doi.org/10.3390/app152212114

Version Notes

Order Reprints

Abstract

Finite-element simulations and computer-aided design workflows require complex preprocessing, with geometry creation and simulation setup traditionally demanding significant manual expertise. The question emerges: can machine learning, namely large language models, help automate these processes? This study evaluates how well nine large language models can automate finite-element simulations starting from natural language prompts, generating both the geometry files for meshing (using Gmsh, an open-source geometry and mesh generator) and the input files needed for the solver (using Elmer, an open-source multiphysics simulation tool). Two standard test cases, a simple bar and a wheel and axle assembly, are used to evaluate and compare their performance. A set of criteria and a scoring system is introduced to assess performance across geometry and simulation setup, covering aspects such as file completeness, Boolean operations, shape fidelity, and displacement error. Results show that most LLMs excel at generating solver input files, achieving 78-88% success rate with <1% displacement error when executed. Geometry generation proves more challenging, with 70% success for simple shapes but only 56% for assemblies. Critically, no model successfully implemented Boolean operations required for merging components; GPT-4o uniquely attempted these operations but failed due to volume reuse errors. This 0% success rate for Boolean operations represents the primary bottleneck for assembly automation. Notable findings include extreme performance variability in the smallest model (PHI-3 Mini, varying 0–97% between similar tasks) and complete elimination of unit errors when explicitly prompted for SI units. The results reveal a clear capability gap: while LLMs reliably generate physics solver inputs, they cannot produce ready-to-mesh assemblies, requiring manual intervention for Boolean operations. While the study focuses on a Gmsh–Elmer pipeline, the results likely generalise to other simulation software.

Keywords:

large language models; physics-based simulations; geometry; Gmsh; GPT-4; LLAMA; artificial intelligence in engineering design

Evaluating the Performance of Large Language Models for Geometry and Simulation File Generation in Physics-Based Simulations

Abstract

Article Metrics

Citations

Article Access Statistics