MDPI - Publisher of Open Access Journals

17 pages, 1467 KiB

Open AccessArticle

Confidence-Based Knowledge Distillation to Reduce Training Costs and Carbon Footprint for Low-Resource Neural Machine Translation

by Maria Zafar, Patrick J. Wall, Souhail Bakkali and Rejwanul Haque

Appl. Sci. 2025, 15(14), 8091; https://doi.org/10.3390/app15148091 - 21 Jul 2025

Viewed by 182

Abstract

The transformer-based deep learning approach represents the current state-of-the-art in machine translation (MT) research. Large-scale pretrained transformer models produce state-of-the-art performance across a wide range of MT tasks for many languages. However, such deep neural network (NN) models are often data-, compute-, space-, [...] Read more.

The transformer-based deep learning approach represents the current state-of-the-art in machine translation (MT) research. Large-scale pretrained transformer models produce state-of-the-art performance across a wide range of MT tasks for many languages. However, such deep neural network (NN) models are often data-, compute-, space-, power-, and energy-hungry, typically requiring powerful GPUs or large-scale clusters to train and deploy. As a result, they are often regarded as “non-green” and “unsustainable” technologies. Distilling knowledge from large deep NN models (teachers) to smaller NN models (students) is a widely adopted sustainable development approach in MT as well as in broader areas of natural language processing (NLP), including speech, and image processing. However, distilling large pretrained models presents several challenges. First, increased training time and cost that scales with the volume of data used for training a student model. This could pose a challenge for translation service providers (TSPs), as they may have limited budgets for training. Moreover, CO₂ emissions generated during model training are typically proportional to the amount of data used, contributing to environmental harm. Second, when querying teacher models, including encoder–decoder models such as NLLB, the translations they produce for low-resource languages may be noisy or of low quality. This can undermine sequence-level knowledge distillation (SKD), as student models may inherit and reinforce errors from inaccurate labels. In this study, the teacher model’s confidence estimation is employed to filter those instances from the distilled training data for which the teacher exhibits low confidence. We tested our methods on a low-resource Urdu-to-English translation task operating within a constrained training budget in an industrial translation setting. Our findings show that confidence estimation-based filtering can significantly reduce the cost and CO₂ emissions associated with training a student model without drop in translation quality, making it a practical and environmentally sustainable solution for the TSPs. Full article

(This article belongs to the Special Issue Deep Learning and Its Applications in Natural Language Processing)

► Show Figures

Figure 1

35 pages, 2649 KiB

Open AccessReview

Integrating Radiogenomics and Machine Learning in Musculoskeletal Oncology Care

by Rahul Kumar, Kyle Sporn, Akshay Khanna, Phani Paladugu, Chirag Gowda, Alex Ngo, Ram Jagadeesan, Nasif Zaman and Alireza Tavakkoli

Diagnostics 2025, 15(11), 1377; https://doi.org/10.3390/diagnostics15111377 - 29 May 2025

Cited by 2 | Viewed by 879

Abstract

Musculoskeletal tumors present a diagnostic challenge due to their rarity, histological diversity, and overlapping imaging features. Accurate characterization is essential for effective treatment planning and prognosis, yet current diagnostic workflows rely heavily on invasive biopsy and subjective radiologic interpretation. This review explores the [...] Read more.

Musculoskeletal tumors present a diagnostic challenge due to their rarity, histological diversity, and overlapping imaging features. Accurate characterization is essential for effective treatment planning and prognosis, yet current diagnostic workflows rely heavily on invasive biopsy and subjective radiologic interpretation. This review explores the evolving role of radiogenomics and machine learning in improving diagnostic accuracy for bone and soft tissue tumors. We examine integrating quantitative imaging features from MRI, CT, and PET with genomic and transcriptomic data to enable non-invasive tumor profiling. AI-powered platforms employing convolutional neural networks (CNNs) and radiomic texture analysis show promising results in tumor grading, subtype differentiation (e.g., Osteosarcoma vs. Ewing sarcoma), and predicting mutation signatures (e.g., TP53, RB1). Moreover, we highlight the use of liquid biopsy and circulating tumor DNA (ctDNA) as emerging diagnostic biomarkers, coupled with point-of-care molecular assays, to enable early and accurate detection in low-resource settings. The review concludes by discussing translational barriers, including data harmonization, regulatory challenges, and the need for multi-institutional datasets to validate AI-based diagnostic frameworks. This article synthesizes current advancements and provides a forward-looking view of precision diagnostics in musculoskeletal oncology. Full article

(This article belongs to the Special Issue Advances in Musculoskeletal Imaging: From Diagnosis to Treatment)

► Show Figures

Figure 1

30 pages, 6387 KiB

Open AccessArticle

Transformer-Based Re-Ranking Model for Enhancing Contextual and Syntactic Translation in Low-Resource Neural Machine Translation

by Arifa Javed, Hongying Zan, Orken Mamyrbayev, Muhammad Abdullah, Kanwal Ahmed, Dina Oralbekova, Kassymova Dinara and Ainur Akhmediyarova

Electronics 2025, 14(2), 243; https://doi.org/10.3390/electronics14020243 - 8 Jan 2025

Cited by 1 | Viewed by 2894

Abstract

Neural machine translation (NMT) plays a vital role in modern communication by bridging language barriers and enabling effective information exchange across diverse linguistic communities. Due to the limited availability of data in low-resource languages, NMT faces significant translation challenges. Data sparsity limits NMT [...] Read more.

Neural machine translation (NMT) plays a vital role in modern communication by bridging language barriers and enabling effective information exchange across diverse linguistic communities. Due to the limited availability of data in low-resource languages, NMT faces significant translation challenges. Data sparsity limits NMT models’ ability to learn, generalize, and produce accurate translations, which leads to low coherence and poor context awareness. This paper proposes a transformer-based approach incorporating an encoder–decoder structure, bilingual curriculum learning, and contrastive re-ranking mechanisms. Our approach enriches the training dataset using back-translation and enhances the model’s contextual learning through BERT embeddings. An incomplete-trust (in-trust) loss function is introduced to replace the traditional cross-entropy loss during training. The proposed model effectively handles out-of-vocabulary words and integrates named entity recognition techniques to maintain semantic accuracy. Additionally, the self-attention layers in the transformer architecture enhance the model’s syntactic analysis capabilities, which enables better context awareness and more accurate translations. Extensive experiments are performed on a diverse Chinese–Urdu parallel corpus, developed using human effort and publicly available datasets such as OPUS, WMT, and WiLi. The proposed model demonstrates a BLEU score improvement of 1.80% for Zh→Ur and 2.22% for Ur→Zh compared to the highest-performing comparative model. This significant enhancement indicates better translation quality and accuracy. Full article

► Show Figures

Graphical abstract

15 pages, 4255 KiB

Open AccessArticle

Enhancing Neural Machine Translation Quality for Kannada–Tulu Language Pairs through Transformer Architecture: A Linguistic Feature Integration

by Musica Supriya, U Dinesh Acharya and Ashalatha Nayak

Designs 2024, 8(5), 100; https://doi.org/10.3390/designs8050100 - 12 Oct 2024

Cited by 1 | Viewed by 1831

Abstract

The rise of intelligent systems demands good machine translation models that are less data hungry and more efficient, especially for low- and extremely-low-resource languages with few or no data available. By integrating a linguistic feature to enhance the quality of translation, we have [...] Read more.

The rise of intelligent systems demands good machine translation models that are less data hungry and more efficient, especially for low- and extremely-low-resource languages with few or no data available. By integrating a linguistic feature to enhance the quality of translation, we have developed a generic Neural Machine Translation (NMT) model for Kannada–Tulu language pairs. The NMT model uses Transformer architecture and a state-of-the-art model for translating text from Kannada to Tulu and learns based on the parallel data. Kannada and Tulu are both low-resource Dravidian languages, with Tulu recognised as an extremely-low-resource language. Dravidian languages are morphologically rich and are highly agglutinative in nature and there exist only a few NMT models for Kannada–Tulu language pairs. They exhibit poor translation scores as they fail to capture the linguistic features of the language. The proposed generic approach can benefit other low-resource Indic languages that have smaller parallel corpora for NMT tasks. Evaluation metrics like Bilingual Evaluation Understudy (BLEU), character-level F-score (chrF) and Word Error Rate (WER) are considered to obtain the improved translation scores for the linguistic-feature-embedded NMT model. These results hold promise for further experimentation with other low- and extremely-low-resource language pairs. Full article

(This article belongs to the Special Issue Designing of AIML (Artificial Intelligence and Machine Learning) and Convolutional Neural Network (CNN) Based Architectures and Its Various Applications in the Field of Engineering)

► Show Figures

Figure 1

11 pages, 1288 KiB

Open AccessArticle

Efficient Adaptation: Enhancing Multilingual Models for Low-Resource Language Translation

by Ilhami Sel and Davut Hanbay

Mathematics 2024, 12(19), 3149; https://doi.org/10.3390/math12193149 - 8 Oct 2024

Cited by 4 | Viewed by 2586

Abstract

This study focuses on the neural machine translation task for the TR-EN language pair, which is considered a low-resource language pair. We investigated fine-tuning strategies for pre-trained language models. Specifically, we explored the effectiveness of parameter-efficient adapter methods for fine-tuning multilingual pre-trained language [...] Read more.

This study focuses on the neural machine translation task for the TR-EN language pair, which is considered a low-resource language pair. We investigated fine-tuning strategies for pre-trained language models. Specifically, we explored the effectiveness of parameter-efficient adapter methods for fine-tuning multilingual pre-trained language models. Various combinations of LoRA and bottleneck adapters were experimented with. The combination of LoRA and bottleneck adapters demonstrated superior performance compared to other methods. This combination required only 5% of the pre-trained language model to be fine-tuned. The proposed method enhances parameter efficiency and reduces computational costs. Compared to the full fine-tuning of the multilingual pre-trained language model, it showed only a 3% difference in the BLEU score. Thus, nearly the same performance was achieved at a significantly lower cost. Additionally, models using only bottleneck adapters performed worse despite having a higher parameter count. Although adding LoRA to pre-trained language models alone did not yield sufficient performance, the proposed method improved machine translation. The results obtained are promising, particularly for low-resource language pairs. The proposed method requires less memory and computational load while maintaining translation quality. Full article

(This article belongs to the Special Issue Advances in Mathematical Methods, Machine Learning and Deep Learning Based Applications, 2nd Edition)

► Show Figures

Figure 1

13 pages, 772 KiB

Open AccessArticle

A Mongolian–Chinese Neural Machine Translation Method Based on Semantic-Context Data Augmentation

by Huinuan Zhang, Yatu Ji, Nier Wu and Min Lu

Appl. Sci. 2024, 14(8), 3442; https://doi.org/10.3390/app14083442 - 19 Apr 2024

Cited by 1 | Viewed by 1660

Abstract

Neural machine translation (NMT) typically relies on a substantial number of bilingual parallel corpora for effective training. Mongolian, as a low-resource language, has relatively few parallel corpora, resulting in poor translation performance. Data augmentation (DA) is a practical and promising method to solve [...] Read more.

Neural machine translation (NMT) typically relies on a substantial number of bilingual parallel corpora for effective training. Mongolian, as a low-resource language, has relatively few parallel corpora, resulting in poor translation performance. Data augmentation (DA) is a practical and promising method to solve problems related to data sparsity and single semantic structure by expanding the size and structure of available data. In order to address the issues of data sparsity and semantic inconsistency in Mongolian–Chinese NMT processes, this paper proposes a new semantic-context DA method. This method adds an additional semantic encoder based on the original translation model, which utilizes both source and target sentences to generate different semantic vectors to enhance each training instance. The results show that this method significantly improves the quality of Mongolian–Chinese NMT tasks, with an increase of approximately 2.5 BLEU values compared to the basic Transformer model. Compared to the basic model, this method can achieve the same translation results with about half of the data, greatly improving translation efficiency. Full article

(This article belongs to the Special Issue Natural Language Processing: Theory, Methods and Applications)

► Show Figures

Figure 1

28 pages, 933 KiB

Open AccessArticle

A Systematic Evaluation of Recurrent Neural Network Models for Edge Intelligence and Human Activity Recognition Applications

by Varsha S. Lalapura, Veerender Reddy Bhimavarapu, J. Amudha and Hariram Selvamurugan Satheesh

Algorithms 2024, 17(3), 104; https://doi.org/10.3390/a17030104 - 28 Feb 2024

Cited by 10 | Viewed by 2827

Abstract

The Recurrent Neural Networks (RNNs) are an essential class of supervised learning algorithms. Complex tasks like speech recognition, machine translation, sentiment classification, weather prediction, etc., are now performed by well-trained RNNs. Local or cloud-based GPU machines are used to train them. However, inference [...] Read more.

The Recurrent Neural Networks (RNNs) are an essential class of supervised learning algorithms. Complex tasks like speech recognition, machine translation, sentiment classification, weather prediction, etc., are now performed by well-trained RNNs. Local or cloud-based GPU machines are used to train them. However, inference is now shifting to miniature, mobile, IoT devices and even micro-controllers. Due to their colossal memory and computing requirements, mapping RNNs directly onto resource-constrained platforms is arcane and challenging. The efficacy of edge-intelligent RNNs (EI-RNNs) must satisfy both performance and memory-fitting requirements at the same time without compromising one for the other. This study’s aim was to provide an empirical evaluation and optimization of historic as well as recent RNN architectures for high-performance and low-memory footprint goals. We focused on Human Activity Recognition (HAR) tasks based on wearable sensor data for embedded healthcare applications. We evaluated and optimized six different recurrent units, namely Vanilla RNNs, Long Short-Term Memory (LSTM) units, Gated Recurrent Units (GRUs), Fast Gated Recurrent Neural Networks (FGRNNs), Fast Recurrent Neural Networks (FRNNs), and Unitary Gated Recurrent Neural Networks (UGRNNs) on eight publicly available time-series HAR datasets. We used the hold-out and cross-validation protocols for training the RNNs. We used low-rank parameterization, iterative hard thresholding, and spare retraining compression for RNNs. We found that efficient training (i.e., dataset handling and preprocessing procedures, hyperparameter tuning, and so on, and suitable compression methods (like low-rank parameterization and iterative pruning) are critical in optimizing RNNs for performance and memory efficiency. We implemented the inference of the optimized models on Raspberry Pi. Full article

► Show Figures

Figure 1

19 pages, 3185 KiB

Open AccessArticle

The Task of Post-Editing Machine Translation for the Low-Resource Language

by Diana Rakhimova, Aidana Karibayeva and Assem Turarbek

Appl. Sci. 2024, 14(2), 486; https://doi.org/10.3390/app14020486 - 5 Jan 2024

Cited by 6 | Viewed by 3337

Abstract

In recent years, machine translation has made significant advancements; however, its effectiveness can vary widely depending on the language pair. Languages with limited resources, such as Kazakh, Uzbek, Kalmyk, Tatar, and others, often encounter challenges in achieving high-quality machine translations. Kazakh is an [...] Read more.

In recent years, machine translation has made significant advancements; however, its effectiveness can vary widely depending on the language pair. Languages with limited resources, such as Kazakh, Uzbek, Kalmyk, Tatar, and others, often encounter challenges in achieving high-quality machine translations. Kazakh is an agglutinative language with complex morphology, making it a low-resource language. This article addresses the task of post-editing machine translation for the Kazakh language. The research begins by discussing the history and evolution of machine translation and how it has developed to meet the unique needs of languages with limited resources. The research resulted in the development of a machine translation post-editing system. The system utilizes modern machine learning methods, starting with neural machine translation using the BRNN model in the initial post-editing stage. Subsequently, the transformer model is applied to further edit the text. Complex structural and grammatical forms are processed, and abbreviations are replaced. Practical experiments were conducted on various texts: news publications, legislative documents, IT sphere, etc. This article serves as a valuable resource for researchers and practitioners in the field of machine translation, shedding light on effective post-editing strategies to enhance translation quality, particularly in scenarios involving languages with limited resources such as Kazakh and Uzbek. The obtained results were tested and evaluated using specialized metrics—BLEU, TER, and WER. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

14 pages, 6671 KiB

Open AccessArticle

Neural Machine Translation Research on Syntactic Information Fusion Based on the Field of Electrical Engineering

by Yanna Sang, Yuan Chen and Juwei Zhang

Appl. Sci. 2023, 13(23), 12905; https://doi.org/10.3390/app132312905 - 1 Dec 2023

Viewed by 1919

Abstract

Neural machine translation has achieved good translation results, but needs further improvement in low-resource and domain-specific translation. To this end, the paper proposed to incorporate source language syntactic information into neural machine translation models. Two novel approaches, namely Contrastive Language–Image Pre-training(CLIP) and Cross-attention [...] Read more.

Neural machine translation has achieved good translation results, but needs further improvement in low-resource and domain-specific translation. To this end, the paper proposed to incorporate source language syntactic information into neural machine translation models. Two novel approaches, namely Contrastive Language–Image Pre-training(CLIP) and Cross-attention Fusion (CAF), were compared to a base transformer model on EN–ZH and ZH–EN pair machine translation focusing on the electrical engineering domain. In addition, an ablation study on the effect of both proposed methods was presented. Among them, the CLIP pre-training method improved significantly compared with the baseline system, and the BLEU values in the EN–ZH and ZH–EN tasks increased by 3.37 and 3.18 percentage points, respectively. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

► Show Figures

Figure 1

24 pages, 1246 KiB

Open AccessArticle

adaptMLLM: Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM Playgrounds

by Séamus Lankford, Haithem Afli and Andy Way

Information 2023, 14(12), 638; https://doi.org/10.3390/info14120638 - 29 Nov 2023

Cited by 27 | Viewed by 13200

Abstract

The advent of Multilingual Language Models (MLLMs) and Large Language Models (LLMs) has spawned innovation in many areas of natural language processing. Despite the exciting potential of this technology, its impact on developing high-quality Machine Translation (MT) outputs for low-resource languages remains relatively [...] Read more.

The advent of Multilingual Language Models (MLLMs) and Large Language Models (LLMs) has spawned innovation in many areas of natural language processing. Despite the exciting potential of this technology, its impact on developing high-quality Machine Translation (MT) outputs for low-resource languages remains relatively under-explored. Furthermore, an open-source application, dedicated to both fine-tuning MLLMs and managing the complete MT workflow for low-resources languages, remains unavailable. We aim to address these imbalances through the development of adaptMLLM, which streamlines all processes involved in the fine-tuning of MLLMs for MT. This open-source application is tailored for developers, translators, and users who are engaged in MT. It is particularly useful for newcomers to the field, as it significantly streamlines the configuration of the development environment. An intuitive interface allows for easy customisation of hyperparameters, and the application offers a range of metrics for model evaluation and the capability to deploy models as a translation service directly within the application. As a multilingual tool, we used adaptMLLM to fine-tune models for two low-resource language pairs: English to Irish (EN

\leftrightarrow

GA) and English to Marathi (EN

\leftrightarrow

MR). Compared with baselines from the LoResMT2021 Shared Task, the adaptMLLM system demonstrated significant improvements. In the EN

\to

GA direction, an improvement of 5.2 BLEU points was observed and an increase of 40.5 BLEU points was recorded in the GA

\to

EN direction representing relative improvements of 14% and 117%, respectively. Significant improvements in the translation performance of the EN

\leftrightarrow

MR pair were also observed notably in the MR

\to

EN direction with an increase of 21.3 BLEU points which corresponds to a relative improvement of 68%. Finally, a fine-grained human evaluation of the MLLM output on the EN

\to

GA pair was conducted using the Multidimensional Quality Metrics and Scalar Quality Metrics error taxonomies. The application and models are freely available. Full article

(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)

► Show Figures

Graphical abstract

13 pages, 2003 KiB

Open AccessArticle

Neural Machine Translation of Electrical Engineering with Fusion of Memory Information

by Yuan Chen, Zikang Liu and Juwei Zhang

Appl. Sci. 2023, 13(18), 10279; https://doi.org/10.3390/app131810279 - 13 Sep 2023

Viewed by 1647

Abstract

This paper proposes a new neural machine translation model of electrical engineering that combines a transformer with gated recurrent unit (GRU) networks. By fusing global information and memory information, the model effectively improves the performance of low-resource neural machine translation. Unlike traditional transformers, [...] Read more.

This paper proposes a new neural machine translation model of electrical engineering that combines a transformer with gated recurrent unit (GRU) networks. By fusing global information and memory information, the model effectively improves the performance of low-resource neural machine translation. Unlike traditional transformers, our proposed model includes two different encoders: one is the global information encoder, which focuses on contextual information, and the other is the memory encoder, which is responsible for capturing recurrent memory information. The model with these two types of attention can encode both global and memory information and learn richer semantic knowledge. Because transformers require global attention calculation for each word position, the time and space complexity are both squared with the length of the source language sequence. When the length of the source language sequence becomes too long, the performance of the transformer will sharply decline. Therefore, we propose a memory information encoder based on the GRU to improve this drawback. The model proposed in this paper has a maximum improvement of 2.04 BLEU points over the baseline model in the field of electrical engineering with low resources. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

► Show Figures

Figure 1

21 pages, 1620 KiB

Open AccessArticle

Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case

by Irina Kipyatkova and Ildar Kagirov

Mathematics 2023, 11(18), 3814; https://doi.org/10.3390/math11183814 - 5 Sep 2023

Cited by 3 | Viewed by 1831

Abstract

Recently, there has been a growth in the number of studies addressing the automatic processing of low-resource languages. The lack of speech and text data significantly hinders the development of speech technologies for such languages. This paper introduces an automatic speech recognition system [...] Read more.

Recently, there has been a growth in the number of studies addressing the automatic processing of low-resource languages. The lack of speech and text data significantly hinders the development of speech technologies for such languages. This paper introduces an automatic speech recognition system for Livvi-Karelian. Acoustic models based on artificial neural networks with time delays and hidden Markov models were trained using a limited speech dataset of 3.5 h. To augment the data, pitch and speech rate perturbation, SpecAugment, and their combinations were employed. Language models based on 3-grams and neural networks were trained using written texts and transcripts. The achieved word error rate metric of 22.80% is comparable to other low-resource languages. To the best of our knowledge, this is the first speech recognition system for Livvi-Karelian. The results obtained can be of a certain significance for development of automatic speech recognition systems not only for Livvi-Karelian, but also for other low-resource languages, including the fields of speech recognition and machine translation systems. Future work includes experiments with Karelian data using techniques such as transfer learning and DNN language models. Full article

(This article belongs to the Special Issue Recent Advances in Neural Networks and Applications)

► Show Figures

Figure 1

15 pages, 2746 KiB

Open AccessArticle

Neural Machine Translation of Electrical Engineering Based on Integrated Convolutional Neural Networks

by Zikang Liu, Yuan Chen and Juwei Zhang

Electronics 2023, 12(17), 3604; https://doi.org/10.3390/electronics12173604 - 25 Aug 2023

Cited by 1 | Viewed by 2031

Abstract

Research has shown that neural machine translation performs poorly on low-resource and specific domain parallel corpora. In this paper, we focus on the problem of neural machine translation in the field of electrical engineering. To address the mistranslation caused by the Transformer model’s [...] Read more.

Research has shown that neural machine translation performs poorly on low-resource and specific domain parallel corpora. In this paper, we focus on the problem of neural machine translation in the field of electrical engineering. To address the mistranslation caused by the Transformer model’s limited ability to extract feature information from certain sentences, we propose two new models that integrate a convolutional neural network as a feature extraction layer into the Transformer model. The feature information extracted by the CNN is fused separately in the source-side and target-side models, which enhances the Transformer model’s ability to extract feature information, optimizes model performance, and improves translation quality. On the dataset of the field of electrical engineering, the proposed source-side and target-side models improved BLEU scores by 1.63 and 1.12 percentage points, respectively, compared to the baseline model. In addition, the two models proposed in this paper can learn rich semantic knowledge without relying on auxiliary knowledge such as part-of-speech tagging and named entity recognition, which saves a certain amount of human resources and time costs. Full article

(This article belongs to the Special Issue Natural Language Processing and Information Retrieval)

► Show Figures

Figure 1

17 pages, 892 KiB

Open AccessArticle

Part-of-Speech Tags Guide Low-Resource Machine Translation

by Zaokere Kadeer, Nian Yi and Aishan Wumaier

Electronics 2023, 12(16), 3401; https://doi.org/10.3390/electronics12163401 - 10 Aug 2023

Cited by 3 | Viewed by 1930

Abstract

Neural machine translation models are guided by loss function to select source sentence features and generate results close to human annotation. When the data resources are abundant, neural machine translation models can focus on the features used to produce high-quality translations. These features [...] Read more.

Neural machine translation models are guided by loss function to select source sentence features and generate results close to human annotation. When the data resources are abundant, neural machine translation models can focus on the features used to produce high-quality translations. These features include POS or other grammatical features. However, models cannot focus precisely on these features when data resources are limited. The reason is that the lack of samples makes the model overfit before considering these features. Previous works have enriched the features by integrating source POS or multitask methods. However, these methods only utilize the source POS or produce translations by introducing the generated target POS. We propose introducing POS information based on multitask methods and reconstructors. We obtain the POS tags by the additional encoder and decoder and compute the corresponding loss function. These loss functions are used with the loss function of machine translation to optimize the parameters of the entire model, which makes the model pay attention to POS features. The POS features focused on by models will guide the translation process and alleviate the problem that models cannot focus on the POS features in the case of low resources. Experiments on multiple translation tasks show that the method improves 0.4∼1 BLEU compared with the baseline model on different translation tasks. Full article

(This article belongs to the Special Issue Natural Language Processing and Information Retrieval)

► Show Figures

Figure 1

15 pages, 2108 KiB

Open AccessArticle

A Scenario-Generic Neural Machine Translation Data Augmentation Method

by Xiner Liu, Jianshu He, Mingzhe Liu, Zhengtong Yin, Lirong Yin and Wenfeng Zheng

Electronics 2023, 12(10), 2320; https://doi.org/10.3390/electronics12102320 - 21 May 2023

Cited by 72 | Viewed by 3992

Abstract

Amid the rapid advancement of neural machine translation, the challenge of data sparsity has been a major obstacle. To address this issue, this study proposes a general data augmentation technique for various scenarios. It examines the predicament of parallel corpora diversity and high [...] Read more.

Amid the rapid advancement of neural machine translation, the challenge of data sparsity has been a major obstacle. To address this issue, this study proposes a general data augmentation technique for various scenarios. It examines the predicament of parallel corpora diversity and high quality in both rich- and low-resource settings, and integrates the low-frequency word substitution method and reverse translation approach for complementary benefits. Additionally, this method improves the pseudo-parallel corpus generated by the reverse translation method by substituting low-frequency words and includes a grammar error correction module to reduce grammatical errors in low-resource scenarios. The experimental data are partitioned into rich- and low-resource scenarios at a 10:1 ratio. It verifies the necessity of grammatical error correction for pseudo-corpus in low-resource scenarios. Models and methods are chosen from the backbone network and related literature for comparative experiments. The experimental findings demonstrate that the data augmentation approach proposed in this study is suitable for both rich- and low-resource scenarios and is effective in enhancing the training corpus to improve the performance of translation tasks. Full article

(This article belongs to the Special Issue Natural Language Processing and Information Retrieval)

► Show Figures

Figure 1

Search Results (33)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (33)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI