Machine Translation for Conquering Language Barriers

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 31 July 2025 | Viewed by 27380

Special Issue Editor


E-Mail Website
Guest Editor
Department of Information and Communication Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, 10000 Zagreb, Croatia
Interests: natural language processing; machine translation; machine learning; data science

Special Issue Information

Dear colleagues,

The MDPI journal Information invites submissions to a Special Issue on “Machine Translation for Conquering Language Barriers”.

Machine translation is increasingly becoming a hot research topic due to its potential of becoming one of the most significant disruptive technologies today. It battles various aspects of existing language barriers in challenging ways, thus striving to enable effective communication and transferring of meaning across different languages by applying different approaches, technologies, and solutions. The idea behind machine translation is to automate the process of translation within the contemporary translation workflow in order to respond to the overwhelming quantity of data sets that need to be translated in the best possible way, with special emphasis on speed and quality. However, since machine translation systems depend on large data sets and computer power, numerous issues remain open, especially when it comes to less spoken and under-resourced languages.

In this Special Issue, original and unpublished works with results in any way related to machine translation and linked areas are welcome, especially those that include experimental and methodological aspects on novel solutions, system implementation approaches, new data sets and resources, natural language processing techniques and tools, hybrid solutions, technology combination and integration, computer-assisted translation and impact on quality and productivity, various types of user studies, incorporation of linguistic knowledge and other digital resources, translation quality evaluation and estimation, post-editing efforts and strategies, ethical and legal issues, but also other concerns that deal with tackling existing language barriers and problems within the field of machine translation. Nevertheless, submissions with a strong theoretical contribution are also desirable.

 Topics of interest include but are not limited to:

  • Machine translation systems and deployment;
  • Analysis of machine translation models and approaches;
  • Evaluation of machine translation quality;
  • Machine translation quality estimation;
  • Corpora and other resources for machine translation;
  • Natural language processing for machine translation;
  • Language technologies for machine translation;
  • Linguistic knowledge in machine translation;
  • Machine translation application in various fields.

Dr. Ivan Dunđer
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Machine translation systems and deployment
  • Analysis of machine translation models and approaches
  • Evaluation of machine translation quality
  • Machine translation quality estimation
  • Corpora and other resources for machine translation
  • Language technologies for machine translation

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 886 KiB  
Article
Exploring the Potential of Neural Machine Translation for Cross-Language Clinical Natural Language Processing (NLP) Resource Generation through Annotation Projection
by Jan Rodríguez-Miret, Eulàlia Farré-Maduell, Salvador Lima-López, Laura Vigil, Vicent Briva-Iglesias and Martin Krallinger
Information 2024, 15(10), 585; https://doi.org/10.3390/info15100585 (registering DOI) - 25 Sep 2024
Viewed by 173
Abstract
Recent advancements in neural machine translation (NMT) offer promising potential for generating cross-language clinical natural language processing (NLP) resources. There is a pressing need to be able to foster the development of clinical NLP tools that extract key clinical entities in a comparable [...] Read more.
Recent advancements in neural machine translation (NMT) offer promising potential for generating cross-language clinical natural language processing (NLP) resources. There is a pressing need to be able to foster the development of clinical NLP tools that extract key clinical entities in a comparable way for a multitude of medical application scenarios that are hindered by lack of multilingual annotated data. This study explores the efficacy of using NMT and annotation projection techniques with expert-in-the-loop validation to develop named entity recognition (NER) systems for an under-resourced target language (Catalan) by leveraging Spanish clinical corpora annotated by domain experts. We employed a state-of-the-art NMT system to translate three clinical case corpora. The translated annotations were then projected onto the target language texts and subsequently validated and corrected by clinical domain experts. The efficacy of the resulting NER systems was evaluated against manually annotated test sets in the target language. Our findings indicate that this approach not only facilitates the generation of high-quality training data for the target language (Catalan) but also demonstrates the potential to extend this methodology to other languages, thereby enhancing multilingual clinical NLP resource development. The generated corpora and components are publicly accessible, potentially providing a valuable resource for further research and application in multilingual clinical settings. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Figure 1

18 pages, 1625 KiB  
Article
Human versus Neural Machine Translation Creativity: A Study on Manipulated MWEs in Literature
by Gloria Corpas Pastor and Laura Noriega-Santiáñez
Information 2024, 15(9), 530; https://doi.org/10.3390/info15090530 - 2 Sep 2024
Viewed by 601
Abstract
In the digital era, the (r)evolution of neural machine translation (NMT) has reshaped both the market and translators’ workflow. However, the adoption of this technology has not fully reached the creative field of literary translation. Against this background, this study aims to explore [...] Read more.
In the digital era, the (r)evolution of neural machine translation (NMT) has reshaped both the market and translators’ workflow. However, the adoption of this technology has not fully reached the creative field of literary translation. Against this background, this study aims to explore to what extent NMT systems can be used to translate the creative challenges posed by idioms, specifically manipulated multiword expressions (MWEs) found in literary texts. To carry out this pilot study, five manipulated MWEs were selected from a fantasy novel and machine-translated (English > Spanish) by four NMT systems (DeepL, Google Translate, Bing Translator, and Reverso). Then, each NMT output as well as a human translation are assessed by six professional literary translators by using a human evaluation sheet. Based on these results, the creativity obtained in each translation method was calculated. Despite the satisfactory performance of both DeepL and Google Translate, HT creativity was highly superior in almost all manipulated MWEs. To the best of our knowledge, this paper not only contributes to the ongoing study of NMT applied to literature, but it is also one of the few studies that delve into the almost unexplored field of assessing creativity in neural machine-translated MWEs. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Figure 1

19 pages, 2313 KiB  
Article
Machine Translation for Open Scholarly Communication: Examining the Relationship between Translation Quality and Reading Effort
by Lieve Macken, Vanessa De Wilde and Arda Tezcan
Information 2024, 15(8), 427; https://doi.org/10.3390/info15080427 - 23 Jul 2024
Viewed by 845
Abstract
This study assesses the usability of machine-translated texts in scholarly communication, using self-paced reading experiments with texts from three scientific disciplines, translated from French into English and vice versa. Thirty-two participants, proficient in the target language, participated. This study uses three machine translation [...] Read more.
This study assesses the usability of machine-translated texts in scholarly communication, using self-paced reading experiments with texts from three scientific disciplines, translated from French into English and vice versa. Thirty-two participants, proficient in the target language, participated. This study uses three machine translation engines (DeepL, ModernMT, OpenNMT), which vary in translation quality. The experiments aim to determine the relationship between translation quality and readers’ reception effort, measured by reading times. The results show that for two disciplines, manual and automatic translation quality measures are significant predictors of reading time. For the most technical discipline, this study could not build models that outperformed the baseline models, which only included participant and text ID as random factors. This study acknowledges the need to include reader-specific features, such as prior knowledge, in future research. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Figure 1

24 pages, 1246 KiB  
Article
adaptMLLM: Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM Playgrounds
by Séamus Lankford, Haithem Afli and Andy Way
Information 2023, 14(12), 638; https://doi.org/10.3390/info14120638 - 29 Nov 2023
Cited by 5 | Viewed by 7625
Abstract
The advent of Multilingual Language Models (MLLMs) and Large Language Models (LLMs) has spawned innovation in many areas of natural language processing. Despite the exciting potential of this technology, its impact on developing high-quality Machine Translation (MT) outputs for low-resource languages remains relatively [...] Read more.
The advent of Multilingual Language Models (MLLMs) and Large Language Models (LLMs) has spawned innovation in many areas of natural language processing. Despite the exciting potential of this technology, its impact on developing high-quality Machine Translation (MT) outputs for low-resource languages remains relatively under-explored. Furthermore, an open-source application, dedicated to both fine-tuning MLLMs and managing the complete MT workflow for low-resources languages, remains unavailable. We aim to address these imbalances through the development of adaptMLLM, which streamlines all processes involved in the fine-tuning of MLLMs for MT. This open-source application is tailored for developers, translators, and users who are engaged in MT. It is particularly useful for newcomers to the field, as it significantly streamlines the configuration of the development environment. An intuitive interface allows for easy customisation of hyperparameters, and the application offers a range of metrics for model evaluation and the capability to deploy models as a translation service directly within the application. As a multilingual tool, we used adaptMLLM to fine-tune models for two low-resource language pairs: English to Irish (EN GA) and English to Marathi (ENMR). Compared with baselines from the LoResMT2021 Shared Task, the adaptMLLM system demonstrated significant improvements. In the ENGA direction, an improvement of 5.2 BLEU points was observed and an increase of 40.5 BLEU points was recorded in the GAEN direction representing relative improvements of 14% and 117%, respectively. Significant improvements in the translation performance of the ENMR pair were also observed notably in the MREN direction with an increase of 21.3 BLEU points which corresponds to a relative improvement of 68%. Finally, a fine-grained human evaluation of the MLLM output on the ENGA pair was conducted using the Multidimensional Quality Metrics and Scalar Quality Metrics error taxonomies. The application and models are freely available. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Graphical abstract

18 pages, 973 KiB  
Article
Four Million Segments and Counting: Building an English-Croatian Parallel Corpus through Crowdsourcing Using a Novel Gamification-Based Platform
by Rafał Jaworski, Sanja Seljan and Ivan Dunđer
Information 2023, 14(4), 226; https://doi.org/10.3390/info14040226 - 6 Apr 2023
Viewed by 2240
Abstract
Parallel corpora have been widely used in the fields of natural language processing and translation as they provide crucial multilingual information. They are used to train machine translation systems, compile dictionaries, or generate inter-language word embeddings. There are many corpora available publicly; however, [...] Read more.
Parallel corpora have been widely used in the fields of natural language processing and translation as they provide crucial multilingual information. They are used to train machine translation systems, compile dictionaries, or generate inter-language word embeddings. There are many corpora available publicly; however, support for some languages is still limited. In this paper, the authors present a framework for collecting, organizing, and storing corpora. The solution was originally designed to obtain data for less-resourced languages, but it proved to work very well for the collection of high-value domain-specific corpora. The scenario is based on the collective work of a group of people who are motivated by the means of gamification. The rules of the game motivate the participants to submit large resources, and a peer-review process ensures quality. More than four million translated segments have been collected so far. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Figure 1

21 pages, 2963 KiB  
Article
LiST: A Lightweight Framework for Continuous Indian Sign Language Translation
by Amrutha K, Prabu P and Ramesh Chandra Poonia
Information 2023, 14(2), 79; https://doi.org/10.3390/info14020079 - 29 Jan 2023
Cited by 9 | Viewed by 3100
Abstract
Sign language is a natural, structured, and complete form of communication to exchange information. Non-verbal communicators, also referred to as hearing impaired and hard of hearing (HI&HH), consider sign language an elemental mode of communication to convey information. As this language is less [...] Read more.
Sign language is a natural, structured, and complete form of communication to exchange information. Non-verbal communicators, also referred to as hearing impaired and hard of hearing (HI&HH), consider sign language an elemental mode of communication to convey information. As this language is less familiar among a large percentage of the human population, an automatic sign language translator that can act as an interpreter and remove the language barrier is mandatory. The advent of deep learning has resulted in the availability of several sign language translation (SLT) models. However, SLT models are complex, resulting in increased latency in language translation. Furthermore, SLT models consider only hand gestures for further processing, which might lead to the misinterpretation of ambiguous sign language words. In this paper, we propose a lightweight SLT framework, LiST (Lightweight Sign language Translation), that simultaneously considers multiple modalities, such as hand gestures, facial expressions, and hand orientation, from an Indian sign video. The Inception V3 architecture handles the features associated with different signer modalities, resulting in the generation of a feature map, which is processed by a two-layered (long short-term memory) (LSTM) architecture. This sequence helps in sentence-by-sentence recognition and in the translation of sign language into text and audio. The model was tested with continuous Indian Sign Language (ISL) sentences taken from the INCLUDE dataset. The experimental results show that the LiST framework achieved a high translation accuracy of 91.2% and a prediction accuracy of 95.9% while maintaining a low word-level translation error compared to other existing models. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Figure 1

22 pages, 314 KiB  
Article
Post-Editese in Literary Translations
by Sheila Castilho and Natália Resende
Information 2022, 13(2), 66; https://doi.org/10.3390/info13020066 - 28 Jan 2022
Cited by 15 | Viewed by 3893
Abstract
In the present study, we investigated the post-editese phenomenon, i.e., the unique features that set machine translated post-edited texts apart from human-translated texts. We used two literary texts, namely, the English children’s novel by Lewis Carroll Alice’s Adventures in Wonderland (AW) and Paula [...] Read more.
In the present study, we investigated the post-editese phenomenon, i.e., the unique features that set machine translated post-edited texts apart from human-translated texts. We used two literary texts, namely, the English children’s novel by Lewis Carroll Alice’s Adventures in Wonderland (AW) and Paula Hawkins’ popular book The Girl on the Train (TGOTT). Both literary texts were Google translated from English into Brazilian Portuguese to investigate whether the post-editese features can be found on the surface of the post-edited (PE) texts. In addition, we examined how the features found in the PE texts differ from the features encountered in the human-translated (HT) and machine translation (MT) versions of the same source text. Results revealed evidence for post-editese for TGOTT only with PE versions being more similar to the MT output than to the HT texts. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
14 pages, 1208 KiB  
Article
Measuring Terminology Consistency in Translated Corpora: Implementation of the Herfindahl-Hirshman Index
by Angelina Gašpar, Sanja Seljan and Vlasta Kučiš
Information 2022, 13(2), 43; https://doi.org/10.3390/info13020043 - 18 Jan 2022
Cited by 4 | Viewed by 4032
Abstract
Consistent terminology can positively influence communication, information transfer, and proper understanding. In multilingual written communication processes, challenges are augmented due to translation variants. The main aim of this study was to implement the Herfindahl-Hirshman Index (HHI) for the assessment of translated terminology in [...] Read more.
Consistent terminology can positively influence communication, information transfer, and proper understanding. In multilingual written communication processes, challenges are augmented due to translation variants. The main aim of this study was to implement the Herfindahl-Hirshman Index (HHI) for the assessment of translated terminology in parallel corpora for the evaluation of translated terminology. This research was conducted on three types of legal domain subcorpora, dating from different periods: the Croatian-English parallel corpus (1991–2009), Latin-English and Latin-Croatian versions of the Code of Canon Law (1983), and English and Croatian versions of the EU legislation (2013). After the terminology extraction process, validation of term candidates was performed, followed by an evaluation. Terminology consistency was measured using the HHI—a commonly accepted measurement of market concentration. Results show that the HHI can be used for measuring terminology consistency to improve information transfer and message understanding. In translation settings, the process shows the need for quality management solutions. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Figure 1

33 pages, 1378 KiB  
Article
Evaluating the Impact of Integrating Similar Translations into Neural Machine Translation
by Arda Tezcan and Bram Bulté
Information 2022, 13(1), 19; https://doi.org/10.3390/info13010019 - 4 Jan 2022
Cited by 3 | Viewed by 2976
Abstract
Previous research has shown that simple methods of augmenting machine translation training data and input sentences with translations of similar sentences (or fuzzy matches), retrieved from a translation memory or bilingual corpus, lead to considerable improvements in translation quality, as assessed by [...] Read more.
Previous research has shown that simple methods of augmenting machine translation training data and input sentences with translations of similar sentences (or fuzzy matches), retrieved from a translation memory or bilingual corpus, lead to considerable improvements in translation quality, as assessed by a limited set of automatic evaluation metrics. In this study, we extend this evaluation by calculating a wider range of automated quality metrics that tap into different aspects of translation quality and by performing manual MT error analysis. Moreover, we investigate in more detail how fuzzy matches influence translations and where potential quality improvements could still be made by carrying out a series of quantitative analyses that focus on different characteristics of the retrieved fuzzy matches. The automated evaluation shows that the quality of NFR translations is higher than the NMT baseline in terms of all metrics. However, the manual error analysis did not reveal a difference between the two systems in terms of total number of translation errors; yet, different profiles emerged when considering the types of errors made. Finally, in our analysis of how fuzzy matches influence NFR translations, we identified a number of features that could be used to improve the selection of fuzzy matches for NFR data augmentation. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Figure 1

Back to TopTop