- Article
Investigating Reproducibility Challenges in LLM Bugfixing on the HumanEvalFix Benchmark
- Balázs Szalontai,
- Balázs Márton,
- Balázs Pintér and
- Tibor Gregorics
Benchmark results for large language models often show inconsistencies across different studies. This paper investigates the challenges of reproducing these results in automatic bugfixing using LLMs, on the HumanEvalFix benchmark. To determine the ca...