Next Article in Journal
A Two Phase Method for Solving the Distribution Problem in a Fuzzy Setting
Previous Article in Journal
Dissipation in Non-Steady State Regulatory Circuits
Open AccessArticle

Improving Neural Machine Translation by Filtering Synthetic Parallel Data

1
Department of Engineering, Computer Science, Sogang University, Seoul 04107, Korea
2
Applied Data Science, Sungkyunkwan University, Suwon 16419, Korea
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(12), 1213; https://doi.org/10.3390/e21121213
Received: 23 October 2019 / Revised: 3 December 2019 / Accepted: 10 December 2019 / Published: 11 December 2019
(This article belongs to the Section Multidisciplinary Applications)
Synthetic data has been shown to be effective in training state-of-the-art neural machine translation (NMT) systems. Because the synthetic data is often generated by back-translating monolingual data from the target language into the source language, it potentially contains a lot of noise—weakly paired sentences or translation errors. In this paper, we propose a novel approach to filter this noise from synthetic data. For each sentence pair of the synthetic data, we compute a semantic similarity score using bilingual word embeddings. By selecting sentence pairs according to these scores, we obtain better synthetic parallel data. Experimental results on the IWSLT 2017 Korean→English translation task show that despite using much less data, our method outperforms the baseline NMT system with back-translation by up to 0.72 and 0.62 Bleu points for tst2016 and tst2017, respectively. View Full-Text
Keywords: neural machine translation; back translation; bilingual word embeddings; synthetic data filtering neural machine translation; back translation; bilingual word embeddings; synthetic data filtering
MDPI and ACS Style

Xu, G.; Ko, Y.; Seo, J. Improving Neural Machine Translation by Filtering Synthetic Parallel Data. Entropy 2019, 21, 1213. https://doi.org/10.3390/e21121213

AMA Style

Xu G, Ko Y, Seo J. Improving Neural Machine Translation by Filtering Synthetic Parallel Data. Entropy. 2019; 21(12):1213. https://doi.org/10.3390/e21121213

Chicago/Turabian Style

Xu, Guanghao; Ko, Youngjoong; Seo, Jungyun. 2019. "Improving Neural Machine Translation by Filtering Synthetic Parallel Data" Entropy 21, no. 12: 1213. https://doi.org/10.3390/e21121213

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop