Abstract
Finite-element simulations and computer-aided design workflows require complex preprocessing, with geometry creation and simulation setup traditionally demanding significant manual expertise. The question emerges: can machine learning, namely large language models, help automate these processes? This study evaluates how well nine large language models can automate finite-element simulations starting from natural language prompts, generating both the geometry files for meshing (using Gmsh, an open-source geometry and mesh generator) and the input files needed for the solver (using Elmer, an open-source multiphysics simulation tool). Two standard test cases, a simple bar and a wheel and axle assembly, are used to evaluate and compare their performance. A set of criteria and a scoring system is introduced to assess performance across geometry and simulation setup, covering aspects such as file completeness, Boolean operations, shape fidelity, and displacement error. Results show that most LLMs excel at generating solver input files, achieving 78-88% success rate with <1% displacement error when executed. Geometry generation proves more challenging, with 70% success for simple shapes but only 56% for assemblies. Critically, no model successfully implemented Boolean operations required for merging components; GPT-4o uniquely attempted these operations but failed due to volume reuse errors. This 0% success rate for Boolean operations represents the primary bottleneck for assembly automation. Notable findings include extreme performance variability in the smallest model (PHI-3 Mini, varying 0–97% between similar tasks) and complete elimination of unit errors when explicitly prompted for SI units. The results reveal a clear capability gap: while LLMs reliably generate physics solver inputs, they cannot produce ready-to-mesh assemblies, requiring manual intervention for Boolean operations. While the study focuses on a Gmsh–Elmer pipeline, the results likely generalise to other simulation software.