- Article
Comparative Analysis of AI Models for Python Code Generation: A HumanEval Benchmark Study
- Ali Bayram,
- Gonca Gokce Menekse Dalveren and
- Mohammad Derawi
This study conducts a comprehensive comparative analysis of six contemporary artificial intelligence models for Python code generation using the HumanEval benchmark. The evaluated models include GPT-3.5 Turbo, GPT-4 Omni, Claude 3.5 Sonnet, Claude 3....