Next Article in Journal
A Two Phase Method for Solving the Distribution Problem in a Fuzzy Setting
Previous Article in Journal
Dissipation in Non-Steady State Regulatory Circuits
Open AccessArticle

Improving Neural Machine Translation by Filtering Synthetic Parallel Data

1
Department of Engineering, Computer Science, Sogang University, Seoul 04107, Korea
2
Applied Data Science, Sungkyunkwan University, Suwon 16419, Korea
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(12), 1213; https://doi.org/10.3390/e21121213
Received: 23 October 2019 / Revised: 3 December 2019 / Accepted: 10 December 2019 / Published: 11 December 2019
(This article belongs to the Section Multidisciplinary Applications)
Synthetic data has been shown to be effective in training state-of-the-art neural machine translation (NMT) systems. Because the synthetic data is often generated by back-translating monolingual data from the target language into the source language, it potentially contains a lot of noise—weakly paired sentences or translation errors. In this paper, we propose a novel approach to filter this noise from synthetic data. For each sentence pair of the synthetic data, we compute a semantic similarity score using bilingual word embeddings. By selecting sentence pairs according to these scores, we obtain better synthetic parallel data. Experimental results on the IWSLT 2017 Korean→English translation task show that despite using much less data, our method outperforms the baseline NMT system with back-translation by up to 0.72 and 0.62 Bleu points for tst2016 and tst2017, respectively. View Full-Text
Keywords: neural machine translation; back translation; bilingual word embeddings; synthetic data filtering neural machine translation; back translation; bilingual word embeddings; synthetic data filtering
MDPI and ACS Style

Xu, G.; Ko, Y.; Seo, J. Improving Neural Machine Translation by Filtering Synthetic Parallel Data. Entropy 2019, 21, 1213.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop