Next Article in Journal
Intelligent Data Fusion and Multi-Agent Coordination for Target Allocation
Next Article in Special Issue
Cognitive Association in Interactive Evolutionary Design Process for Product Styling and Application to SUV Design
Previous Article in Journal
Design of Streaming Data Transmission Using Rolling Shutter Camera-Based Optical Camera Communications
Previous Article in Special Issue
Smart Image Enhancement Using CLAHE Based on an F-Shift Transformation during Decompression
Article

Decoding Strategies for Improving Low-Resource Machine Translation

1
Department of Computer Science and Engineering, Korea University, Seoul 02841, Korea
2
Center for Educational Technology, Institute of Education, the University of Tartu, 50090 Tartu, Estonia
3
Creative Information and Computer Institute, Korea University, Seoul 02841, Korea
*
Author to whom correspondence should be addressed.
Electronics 2020, 9(10), 1562; https://doi.org/10.3390/electronics9101562
Received: 23 July 2020 / Revised: 19 August 2020 / Accepted: 22 September 2020 / Published: 24 September 2020
(This article belongs to the Special Issue Smart Processing for Systems under Uncertainty or Perturbation)
Pre-processing and post-processing are significant aspects of natural language processing (NLP) application software. Pre-processing in neural machine translation (NMT) includes subword tokenization to alleviate the problem of unknown words, parallel corpus filtering that only filters data suitable for training, and data augmentation to ensure that the corpus contains sufficient content. Post-processing includes automatic post editing and the application of various strategies during decoding in the translation process. Most recent NLP researches are based on the Pretrain-Finetuning Approach (PFA). However, when small and medium-sized organizations with insufficient hardware attempt to provide NLP services, throughput and memory problems often occur. These difficulties increase when utilizing PFA to process low-resource languages, as PFA requires large amounts of data, and the data for low-resource languages are often insufficient. Utilizing the current research premise that NMT model performance can be enhanced through various pre-processing and post-processing strategies without changing the model, we applied various decoding strategies to Korean–English NMT, which relies on a low-resource language pair. Through comparative experiments, we proved that translation performance could be enhanced without changes to the model. We experimentally examined how performance changed in response to beam size changes and n-gram blocking, and whether performance was enhanced when a length penalty was applied. The results showed that various decoding strategies enhance the performance and compare well with previous Korean–English NMT approaches. Therefore, the proposed methodology can improve the performance of NMT models, without the use of PFA; this presents a new perspective for improving machine translation performance. View Full-Text
Keywords: neural machine translation; Korean–English neural machine translation; transformer; efficiency processing; post-processing; decoding strategies neural machine translation; Korean–English neural machine translation; transformer; efficiency processing; post-processing; decoding strategies
Show Figures

Figure 1

MDPI and ACS Style

Park, C.; Yang, Y.; Park, K.; Lim, H. Decoding Strategies for Improving Low-Resource Machine Translation. Electronics 2020, 9, 1562. https://doi.org/10.3390/electronics9101562

AMA Style

Park C, Yang Y, Park K, Lim H. Decoding Strategies for Improving Low-Resource Machine Translation. Electronics. 2020; 9(10):1562. https://doi.org/10.3390/electronics9101562

Chicago/Turabian Style

Park, Chanjun; Yang, Yeongwook; Park, Kinam; Lim, Heuiseok. 2020. "Decoding Strategies for Improving Low-Resource Machine Translation" Electronics 9, no. 10: 1562. https://doi.org/10.3390/electronics9101562

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop