Next Article in Journal
Enhancing Customer Segmentation Through Factor Analysis of Mixed Data (FAMD)-Based Approach Using K-Means and Hierarchical Clustering Algorithms
Previous Article in Journal
Reconstructing Domain-Specific Features for Unsupervised Domain-Adaptive Object Detection
Previous Article in Special Issue
The Localization of Software and Video Games: Current State and Future Perspectives
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Evaluating Translation Quality: A Qualitative and Quantitative Assessment of Machine and LLM-Driven Arabic–English Translations

by
Tawffeek A. S. Mohammed
Department of Foreign Languages, University of theWestern Cape, Bellville 7535, South Africa
Information 2025, 16(6), 440; https://doi.org/10.3390/info16060440
Submission received: 11 April 2025 / Revised: 18 May 2025 / Accepted: 21 May 2025 / Published: 26 May 2025
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

Abstract

This study investigates translation quality between Arabic and English, comparing traditional rule-based machine translation systems, modern neural machine translation tools such as Google Translate, and large language models like ChatGPT. The research adopts both qualitative and quantitative approaches to assess the efficacy, accuracy, and contextual fidelity of translations. It particularly focuses on the translation of idiomatic and colloquial expressions as well as technical texts and genres. Using well-established evaluation metrics such as bilingual evaluation understudy (BLEU), translation error rate (TER), and character n-gram F-score (chrF), alongside the qualitative translation quality assessment model proposed by Juliane House, this study investigates the linguistic and semantic nuances of translations generated by different systems. This study concludes that although metric-based evaluations like BLEU and TER are useful, they often fail to fully capture the semantic and contextual accuracy of idiomatic and expressive translations. Large language models, particularly ChatGPT, show promise in addressing this gap by offering more coherent and culturally aligned translations. However, both systems demonstrate limitations that necessitate human post-editing for high-stakes content. The findings support a hybrid approach, combining machine translation tools with human oversight for optimal translation quality, especially in languages with complex morphology and culturally embedded expressions like Arabic.
Keywords: machine translation (MT); large language models (LLMs); neural-based; rule-based; translation quality assessment (TQA); Google Translate; ChatGPT; Arabic; English; metrics machine translation (MT); large language models (LLMs); neural-based; rule-based; translation quality assessment (TQA); Google Translate; ChatGPT; Arabic; English; metrics

Share and Cite

MDPI and ACS Style

Mohammed, T.A.S. Evaluating Translation Quality: A Qualitative and Quantitative Assessment of Machine and LLM-Driven Arabic–English Translations. Information 2025, 16, 440. https://doi.org/10.3390/info16060440

AMA Style

Mohammed TAS. Evaluating Translation Quality: A Qualitative and Quantitative Assessment of Machine and LLM-Driven Arabic–English Translations. Information. 2025; 16(6):440. https://doi.org/10.3390/info16060440

Chicago/Turabian Style

Mohammed, Tawffeek A. S. 2025. "Evaluating Translation Quality: A Qualitative and Quantitative Assessment of Machine and LLM-Driven Arabic–English Translations" Information 16, no. 6: 440. https://doi.org/10.3390/info16060440

APA Style

Mohammed, T. A. S. (2025). Evaluating Translation Quality: A Qualitative and Quantitative Assessment of Machine and LLM-Driven Arabic–English Translations. Information, 16(6), 440. https://doi.org/10.3390/info16060440

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop