Submit to Information Review for Information Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Machine Translation for Conquering Language Barriers

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (31 July 2025) | Viewed by 55927

Share This Special Issue

Special Issue Editor

Dr. Ivan Dunđer

E-Mail Website
Guest Editor

Department of Information and Communication Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, 10000 Zagreb, Croatia
Interests: machine learning; data science; machine translation; natural language processing; language technologies; information systems; databases
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear colleagues,

The MDPI journal Information invites submissions to a Special Issue on “Machine Translation for Conquering Language Barriers”.

Machine translation is increasingly becoming a hot research topic due to its potential of becoming one of the most significant disruptive technologies today. It battles various aspects of existing language barriers in challenging ways, thus striving to enable effective communication and transferring of meaning across different languages by applying different approaches, technologies, and solutions. The idea behind machine translation is to automate the process of translation within the contemporary translation workflow in order to respond to the overwhelming quantity of data sets that need to be translated in the best possible way, with special emphasis on speed and quality. However, since machine translation systems depend on large data sets and computer power, numerous issues remain open, especially when it comes to less spoken and under-resourced languages.

In this Special Issue, original and unpublished works with results in any way related to machine translation and linked areas are welcome, especially those that include experimental and methodological aspects on novel solutions, system implementation approaches, new data sets and resources, natural language processing techniques and tools, hybrid solutions, technology combination and integration, computer-assisted translation and impact on quality and productivity, various types of user studies, incorporation of linguistic knowledge and other digital resources, translation quality evaluation and estimation, post-editing efforts and strategies, ethical and legal issues, but also other concerns that deal with tackling existing language barriers and problems within the field of machine translation. Nevertheless, submissions with a strong theoretical contribution are also desirable.

Topics of interest include but are not limited to:

Machine translation systems and deployment;
Analysis of machine translation models and approaches;
Evaluation of machine translation quality;
Machine translation quality estimation;
Corpora and other resources for machine translation;
Natural language processing for machine translation;
Language technologies for machine translation;
Linguistic knowledge in machine translation;
Machine translation application in various fields.

Dr. Ivan Dunđer
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

Machine translation systems and deployment
Analysis of machine translation models and approaches
Evaluation of machine translation quality
Machine translation quality estimation
Corpora and other resources for machine translation
Language technologies for machine translation

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (11 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

19 pages, 16096 KiB

Open AccessArticle

Evaluating Translation Quality: A Qualitative and Quantitative Assessment of Machine and LLM-Driven Arabic–English Translations

by Tawffeek A. S. Mohammed

Information 2025, 16(6), 440; https://doi.org/10.3390/info16060440 - 26 May 2025

Viewed by 1141

Abstract

This study investigates translation quality between Arabic and English, comparing traditional rule-based machine translation systems, modern neural machine translation tools such as Google Translate, and large language models like ChatGPT. The research adopts both qualitative and quantitative approaches to assess the efficacy, accuracy, and contextual fidelity of translations. It particularly focuses on the translation of idiomatic and colloquial expressions as well as technical texts and genres. Using well-established evaluation metrics such as bilingual evaluation understudy (BLEU), translation error rate (TER), and character n-gram F-score (chrF), alongside the qualitative translation quality assessment model proposed by Juliane House, this study investigates the linguistic and semantic nuances of translations generated by different systems. This study concludes that although metric-based evaluations like BLEU and TER are useful, they often fail to fully capture the semantic and contextual accuracy of idiomatic and expressive translations. Large language models, particularly ChatGPT, show promise in addressing this gap by offering more coherent and culturally aligned translations. However, both systems demonstrate limitations that necessitate human post-editing for high-stakes content. The findings support a hybrid approach, combining machine translation tools with human oversight for optimal translation quality, especially in languages with complex morphology and culturally embedded expressions like Arabic. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

► Show Figures

Figure 1

20 pages, 2499 KiB

Open AccessArticle

The Localization of Software and Video Games: Current State and Future Perspectives

by Marco Pirrone and Arianna D’Ulizia

Information 2024, 15(10), 648; https://doi.org/10.3390/info15100648 - 17 Oct 2024

Cited by 2 | Viewed by 6086

Abstract

The study of linguistics applied to computer science is a much-discussed topic today. In this area, particularly relevant is the software localization process describing the linguistic and cultural adaptation of software products to a specific market scenario. Software localization is going through a phase of strong development due to the great market demand and the current trend of making the computer more human-like in the way it interacts with the user. This paper focuses on “linguistic” localization by addressing the language translation process from the perspective of translation studies. In particular, the process of translating the language assets in a game and making the game linguistically and culturally appropriate for the target market will be explored. The study provides a systematic literature review of the main localization methods developed over the last four decades, along with the major issues and challenges mainly related to the main linguistic and cultural aspects of videogames. The review results are integrated with the results of a qualitative analysis conducted through a focus group with the participation of both academic and professional experts in software and videogame localization. The results of this study are worthwhile for many academics and industry professionals as they provide an in-depth overview of the localization process in software and videogames as well as potential directions for future research. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

► Show Figures

Figure 1

16 pages, 886 KiB

Open AccessArticle

Exploring the Potential of Neural Machine Translation for Cross-Language Clinical Natural Language Processing (NLP) Resource Generation through Annotation Projection

by Jan Rodríguez-Miret, Eulàlia Farré-Maduell, Salvador Lima-López, Laura Vigil, Vicent Briva-Iglesias and Martin Krallinger

Information 2024, 15(10), 585; https://doi.org/10.3390/info15100585 - 25 Sep 2024

Cited by 1 | Viewed by 3049

Abstract

Recent advancements in neural machine translation (NMT) offer promising potential for generating cross-language clinical natural language processing (NLP) resources. There is a pressing need to be able to foster the development of clinical NLP tools that extract key clinical entities in a comparable way for a multitude of medical application scenarios that are hindered by lack of multilingual annotated data. This study explores the efficacy of using NMT and annotation projection techniques with expert-in-the-loop validation to develop named entity recognition (NER) systems for an under-resourced target language (Catalan) by leveraging Spanish clinical corpora annotated by domain experts. We employed a state-of-the-art NMT system to translate three clinical case corpora. The translated annotations were then projected onto the target language texts and subsequently validated and corrected by clinical domain experts. The efficacy of the resulting NER systems was evaluated against manually annotated test sets in the target language. Our findings indicate that this approach not only facilitates the generation of high-quality training data for the target language (Catalan) but also demonstrates the potential to extend this methodology to other languages, thereby enhancing multilingual clinical NLP resource development. The generated corpora and components are publicly accessible, potentially providing a valuable resource for further research and application in multilingual clinical settings. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

► Show Figures

Figure 1

18 pages, 1625 KiB

Open AccessArticle

Human versus Neural Machine Translation Creativity: A Study on Manipulated MWEs in Literature

by Gloria Corpas Pastor and Laura Noriega-Santiáñez

Information 2024, 15(9), 530; https://doi.org/10.3390/info15090530 - 2 Sep 2024

Cited by 4 | Viewed by 4283

Abstract

In the digital era, the (r)evolution of neural machine translation (NMT) has reshaped both the market and translators’ workflow. However, the adoption of this technology has not fully reached the creative field of literary translation. Against this background, this study aims to explore to what extent NMT systems can be used to translate the creative challenges posed by idioms, specifically manipulated multiword expressions (MWEs) found in literary texts. To carry out this pilot study, five manipulated MWEs were selected from a fantasy novel and machine-translated (English > Spanish) by four NMT systems (DeepL, Google Translate, Bing Translator, and Reverso). Then, each NMT output as well as a human translation are assessed by six professional literary translators by using a human evaluation sheet. Based on these results, the creativity obtained in each translation method was calculated. Despite the satisfactory performance of both DeepL and Google Translate, HT creativity was highly superior in almost all manipulated MWEs. To the best of our knowledge, this paper not only contributes to the ongoing study of NMT applied to literature, but it is also one of the few studies that delve into the almost unexplored field of assessing creativity in neural machine-translated MWEs. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

► Show Figures

Figure 1

18 pages, 2313 KiB

Open AccessArticle

Machine Translation for Open Scholarly Communication: Examining the Relationship between Translation Quality and Reading Effort

by Lieve Macken, Vanessa De Wilde and Arda Tezcan

Information 2024, 15(8), 427; https://doi.org/10.3390/info15080427 - 23 Jul 2024

Viewed by 2417

Abstract

This study assesses the usability of machine-translated texts in scholarly communication, using self-paced reading experiments with texts from three scientific disciplines, translated from French into English and vice versa. Thirty-two participants, proficient in the target language, participated. This study uses three machine translation engines (DeepL, ModernMT, OpenNMT), which vary in translation quality. The experiments aim to determine the relationship between translation quality and readers’ reception effort, measured by reading times. The results show that for two disciplines, manual and automatic translation quality measures are significant predictors of reading time. For the most technical discipline, this study could not build models that outperformed the baseline models, which only included participant and text ID as random factors. This study acknowledges the need to include reader-specific features, such as prior knowledge, in future research. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

► Show Figures

Figure 1

24 pages, 1246 KiB

Open AccessArticle

adaptMLLM: Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM Playgrounds

by Séamus Lankford, Haithem Afli and Andy Way

Information 2023, 14(12), 638; https://doi.org/10.3390/info14120638 - 29 Nov 2023

Cited by 27 | Viewed by 13429

Abstract

The advent of Multilingual Language Models (MLLMs) and Large Language Models (LLMs) has spawned innovation in many areas of natural language processing. Despite the exciting potential of this technology, its impact on developing high-quality Machine Translation (MT) outputs for low-resource languages remains relatively under-explored. Furthermore, an open-source application, dedicated to both fine-tuning MLLMs and managing the complete MT workflow for low-resources languages, remains unavailable. We aim to address these imbalances through the development of adaptMLLM, which streamlines all processes involved in the fine-tuning of MLLMs for MT. This open-source application is tailored for developers, translators, and users who are engaged in MT. It is particularly useful for newcomers to the field, as it significantly streamlines the configuration of the development environment. An intuitive interface allows for easy customisation of hyperparameters, and the application offers a range of metrics for model evaluation and the capability to deploy models as a translation service directly within the application. As a multilingual tool, we used adaptMLLM to fine-tune models for two low-resource language pairs: English to Irish (EN

\leftrightarrow

GA) and English to Marathi (EN

\leftrightarrow

MR). Compared with baselines from the LoResMT2021 Shared Task, the adaptMLLM system demonstrated significant improvements. In the EN

\to

GA direction, an improvement of 5.2 BLEU points was observed and an increase of 40.5 BLEU points was recorded in the GA

\to

EN direction representing relative improvements of 14% and 117%, respectively. Significant improvements in the translation performance of the EN

\leftrightarrow

MR pair were also observed notably in the MR

\to

EN direction with an increase of 21.3 BLEU points which corresponds to a relative improvement of 68%. Finally, a fine-grained human evaluation of the MLLM output on the EN

\to

GA pair was conducted using the Multidimensional Quality Metrics and Scalar Quality Metrics error taxonomies. The application and models are freely available. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

► Show Figures

Graphical abstract

18 pages, 973 KiB

Open AccessArticle

Four Million Segments and Counting: Building an English-Croatian Parallel Corpus through Crowdsourcing Using a Novel Gamification-Based Platform

by Rafał Jaworski, Sanja Seljan and Ivan Dunđer

Information 2023, 14(4), 226; https://doi.org/10.3390/info14040226 - 6 Apr 2023

Cited by 2 | Viewed by 3029

Abstract

Parallel corpora have been widely used in the fields of natural language processing and translation as they provide crucial multilingual information. They are used to train machine translation systems, compile dictionaries, or generate inter-language word embeddings. There are many corpora available publicly; however, support for some languages is still limited. In this paper, the authors present a framework for collecting, organizing, and storing corpora. The solution was originally designed to obtain data for less-resourced languages, but it proved to work very well for the collection of high-value domain-specific corpora. The scenario is based on the collective work of a group of people who are motivated by the means of gamification. The rules of the game motivate the participants to submit large resources, and a peer-review process ensures quality. More than four million translated segments have been collected so far. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

► Show Figures

Figure 1

21 pages, 2963 KiB

Open AccessArticle

LiST: A Lightweight Framework for Continuous Indian Sign Language Translation

by Amrutha K, Prabu P and Ramesh Chandra Poonia

Information 2023, 14(2), 79; https://doi.org/10.3390/info14020079 - 29 Jan 2023

Cited by 17 | Viewed by 4715

Abstract

Sign language is a natural, structured, and complete form of communication to exchange information. Non-verbal communicators, also referred to as hearing impaired and hard of hearing (HI&HH), consider sign language an elemental mode of communication to convey information. As this language is less familiar among a large percentage of the human population, an automatic sign language translator that can act as an interpreter and remove the language barrier is mandatory. The advent of deep learning has resulted in the availability of several sign language translation (SLT) models. However, SLT models are complex, resulting in increased latency in language translation. Furthermore, SLT models consider only hand gestures for further processing, which might lead to the misinterpretation of ambiguous sign language words. In this paper, we propose a lightweight SLT framework, LiST (Lightweight Sign language Translation), that simultaneously considers multiple modalities, such as hand gestures, facial expressions, and hand orientation, from an Indian sign video. The Inception V3 architecture handles the features associated with different signer modalities, resulting in the generation of a feature map, which is processed by a two-layered (long short-term memory) (LSTM) architecture. This sequence helps in sentence-by-sentence recognition and in the translation of sign language into text and audio. The model was tested with continuous Indian Sign Language (ISL) sentences taken from the INCLUDE dataset. The experimental results show that the LiST framework achieved a high translation accuracy of 91.2% and a prediction accuracy of 95.9% while maintaining a low word-level translation error compared to other existing models. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

► Show Figures

Figure 1

22 pages, 314 KiB

Open AccessArticle

Post-Editese in Literary Translations

by Sheila Castilho and Natália Resende

Information 2022, 13(2), 66; https://doi.org/10.3390/info13020066 - 28 Jan 2022

Cited by 26 | Viewed by 5149

Abstract

In the present study, we investigated the post-editese phenomenon, i.e., the unique features that set machine translated post-edited texts apart from human-translated texts. We used two literary texts, namely, the English children’s novel by Lewis Carroll Alice’s Adventures in Wonderland (AW) and Paula Hawkins’ popular book The Girl on the Train (TGOTT). Both literary texts were Google translated from English into Brazilian Portuguese to investigate whether the post-editese features can be found on the surface of the post-edited (PE) texts. In addition, we examined how the features found in the PE texts differ from the features encountered in the human-translated (HT) and machine translation (MT) versions of the same source text. Results revealed evidence for post-editese for TGOTT only with PE versions being more similar to the MT output than to the HT texts. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

14 pages, 1208 KiB

Open AccessArticle

Measuring Terminology Consistency in Translated Corpora: Implementation of the Herfindahl-Hirshman Index

by Angelina Gašpar, Sanja Seljan and Vlasta Kučiš

Information 2022, 13(2), 43; https://doi.org/10.3390/info13020043 - 18 Jan 2022

Cited by 5 | Viewed by 5278

Abstract

Consistent terminology can positively influence communication, information transfer, and proper understanding. In multilingual written communication processes, challenges are augmented due to translation variants. The main aim of this study was to implement the Herfindahl-Hirshman Index (HHI) for the assessment of translated terminology in parallel corpora for the evaluation of translated terminology. This research was conducted on three types of legal domain subcorpora, dating from different periods: the Croatian-English parallel corpus (1991–2009), Latin-English and Latin-Croatian versions of the Code of Canon Law (1983), and English and Croatian versions of the EU legislation (2013). After the terminology extraction process, validation of term candidates was performed, followed by an evaluation. Terminology consistency was measured using the HHI—a commonly accepted measurement of market concentration. Results show that the HHI can be used for measuring terminology consistency to improve information transfer and message understanding. In translation settings, the process shows the need for quality management solutions. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

► Show Figures

Figure 1

33 pages, 1378 KiB

Open AccessArticle

Evaluating the Impact of Integrating Similar Translations into Neural Machine Translation

by Arda Tezcan and Bram Bulté

Information 2022, 13(1), 19; https://doi.org/10.3390/info13010019 - 4 Jan 2022

Cited by 4 | Viewed by 4012

Abstract

Previous research has shown that simple methods of augmenting machine translation training data and input sentences with translations of similar sentences (or fuzzy matches), retrieved from a translation memory or bilingual corpus, lead to considerable improvements in translation quality, as assessed by a limited set of automatic evaluation metrics. In this study, we extend this evaluation by calculating a wider range of automated quality metrics that tap into different aspects of translation quality and by performing manual MT error analysis. Moreover, we investigate in more detail how fuzzy matches influence translations and where potential quality improvements could still be made by carrying out a series of quantitative analyses that focus on different characteristics of the retrieved fuzzy matches. The automated evaluation shows that the quality of NFR translations is higher than the NMT baseline in terms of all metrics. However, the manual error analysis did not reveal a difference between the two systems in terms of total number of translation errors; yet, different profiles emerged when considering the types of errors made. Finally, in our analysis of how fuzzy matches influence NFR translations, we identified a number of features that could be used to improve the selection of fuzzy matches for NFR data augmentation. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

► Show Figures

Journal Menu

Journal Browser

Machine Translation for Conquering Language Barriers

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (11 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI