Next Article in Journal
Influence of Silver Nanoparticles on Liposomal Membrane Properties Relevant in Photothermal Therapy
Previous Article in Journal
Binary Classification of Brain MR Images for Meningioma Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Comparative Evaluation of GPT-4o, GPT-OSS-120B and Llama-3.1-8B-Instruct Language Models in a Reproducible CV-to-JSON Extraction Pipeline

1
Department of Artificial Intelligence, Institute of Information Technology, Warsaw University of Life Sciences, ul. Nowoursynowska 159, 02-776 Warsaw, Poland
2
Avenga IT Professionals sp. z o.o., ul. Gwiaździsta 66, 53-413 Wroclaw, Poland
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(1), 217; https://doi.org/10.3390/app16010217
Submission received: 27 October 2025 / Revised: 8 December 2025 / Accepted: 23 December 2025 / Published: 24 December 2025

Abstract

Recruitment automation increasingly relies on Large Language Models (LLMs) for extracting structured information from unstructured CVs and job postings. However, production data often arrive as heterogeneous, privacy-sensitive PDFs, limiting reproducibility and compliance. This study introduces a deterministic, GDPR-aligned pipeline that converts recruitment documents into structured, anonymized Markdown and subsequently into validated JSON ready for downstream AI processing. The workflow combines the Docling PDF-to-Markdown converter with a two-pass anonymization protocol and evaluates three LLM back-ends—GPT-4o (Azure, frozen proprietary), GPT-OSS-120B and Llama-3.1-8B-Instruct—using identical prompts and schema constraints under near-zero-temperature decoding. Each model’s output was assessed across 2280 multilingual CVs using two complementary metrics: reference-based completeness and content similarity. The proprietary GPT-4o achieved perfect schema coverage and served as the reproducibility baseline, while the open-weight models reached 73–79% completeness and 59–72% content similarity depending on section complexity. Llama-3.1-8B-Instruct performed strongly on standardized sections such as contact and legal, whereas GPT-OSS-120B better-handled less frequent narrative fields. The results demonstrate that fully deterministic, auditable document extraction is achievable with both proprietary and open LLMs when guided by strong schema validation and anonymization. The proposed pipeline bridges the gap between document ingestion and reliable, bias-aware data preparation for AI-driven recruitment systems.
Keywords: large language models; GPT-4o; Llama-3.1-8B; GPT-OSS-120B; PDF-to-Markdown conversion; information extraction; structured JSON; recruitment automation; semantic matching large language models; GPT-4o; Llama-3.1-8B; GPT-OSS-120B; PDF-to-Markdown conversion; information extraction; structured JSON; recruitment automation; semantic matching

Share and Cite

MDPI and ACS Style

Nawalny, M.; Łępicki, M.; Latkowski, T.; Bujak, S.; Bukowski, M.; Świderski, B.; Baranik, G.; Nowak, B.; Zakowicz, R.; Dobrakowski, Ł.; et al. Comparative Evaluation of GPT-4o, GPT-OSS-120B and Llama-3.1-8B-Instruct Language Models in a Reproducible CV-to-JSON Extraction Pipeline. Appl. Sci. 2026, 16, 217. https://doi.org/10.3390/app16010217

AMA Style

Nawalny M, Łępicki M, Latkowski T, Bujak S, Bukowski M, Świderski B, Baranik G, Nowak B, Zakowicz R, Dobrakowski Ł, et al. Comparative Evaluation of GPT-4o, GPT-OSS-120B and Llama-3.1-8B-Instruct Language Models in a Reproducible CV-to-JSON Extraction Pipeline. Applied Sciences. 2026; 16(1):217. https://doi.org/10.3390/app16010217

Chicago/Turabian Style

Nawalny, Marcin, Mateusz Łępicki, Tomasz Latkowski, Sebastian Bujak, Michał Bukowski, Bartosz Świderski, Grzegorz Baranik, Bogusz Nowak, Robert Zakowicz, Łukasz Dobrakowski, and et al. 2026. "Comparative Evaluation of GPT-4o, GPT-OSS-120B and Llama-3.1-8B-Instruct Language Models in a Reproducible CV-to-JSON Extraction Pipeline" Applied Sciences 16, no. 1: 217. https://doi.org/10.3390/app16010217

APA Style

Nawalny, M., Łępicki, M., Latkowski, T., Bujak, S., Bukowski, M., Świderski, B., Baranik, G., Nowak, B., Zakowicz, R., Dobrakowski, Ł., Oczeretko, A., Sadowski, P., Szlaga, K., Kubica, B., & Kurek, J. (2026). Comparative Evaluation of GPT-4o, GPT-OSS-120B and Llama-3.1-8B-Instruct Language Models in a Reproducible CV-to-JSON Extraction Pipeline. Applied Sciences, 16(1), 217. https://doi.org/10.3390/app16010217

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop