Large language models (LLMs) such as GPT-4 are increasingly integrated into research, industry, and enterprise workflows, yet little is known about how input file formats shape their outputs. While prior work has shown that formats can influence response time, the effects on readability,
[...] Read more.
Large language models (LLMs) such as GPT-4 are increasingly integrated into research, industry, and enterprise workflows, yet little is known about how input file formats shape their outputs. While prior work has shown that formats can influence response time, the effects on readability, complexity, and semantic stability remain underexplored. This study systematically evaluates GPT-4’s responses to 100 queries drawn from 50 academic papers, each tested across four formats, TXT, DOCX, PDF, and XML, yielding 400 question–answer pairs. We have assessed two aspects of the responses to the queries: first, efficiency quantified by response time and answer length, and second, linguistic style measured by readability indices, sentence length, word length, and lexical diversity where semantic similarity was considered to control for preservation of semantic context. Results show that readability and semantic content remain stable across formats, with no significant differences in Flesch–Kincaid or Dale–Chall scores, but response time is sensitive to document encoding, with XML consistently outperforming PDF, DOCX, and TXT in the initial experiments conducted in February 2025. Verbosity, rather than input size, emerged as the main driver of latency. However, follow-up replications conducted several months later (October 2025) under the updated Microsoft Copilot Studio (GPT-4) environment showed that these latency differences had largely converged, indicating that backend improvements, particularly in GPT-4o’s document-ingestion and parsing pipelines, have reduced the earlier disparities. These findings suggest that the file format matters and affects how fast the LLMs respond, although its influence may diminish as enterprise-level AI systems continue to evolve. Overall, the content and semantics of the responses are fairly similar and consistent across different file formats, demonstrating that LLMs can handle diverse encodings without compromising response quality. For large-scale applications, adopting structured formats such as XML or semantically tagged HTML can still yield measurable throughput gains in earlier system versions, whereas in more optimized environments, such differences may become minimal.
Full article