Improving Neural Machine Translation by Filtering Synthetic Parallel Data
Abstract
:1. Introduction
2. Related Work
3. Proposed Method
3.1. Neural Machine Translation
3.2. Back-Translation for NMT
3.3. Synthetic Parallel Data Filtering with Bilingual Word Embeddings
4. Experimental Setup
4.1. Datasets and Data Preprocessing
4.2. Models and Hyperparameters
5. Experimental Results and Discussion
5.1. Quality of Bilingual Word Embeddings
5.2. Size of Synthetic Datasets
5.3. Quality of Filtered Synthetic Data
5.4. Performance of Proposed Method with a Combination of Real and Synthetic Data
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv 2016, arXiv:1609.08144. Available online: https://arxiv.org/abs/1609.08144 (accessed on 9 December 2019).
- Hassan, H.; Aue, A.; Chen, C.; Chowdhary, V.; Clark, J.; Federmann, C.; Huang, X.; Junczys-Dowmunt, M.; Lewis, W.; Li, M.; et al. Achieving Human Parity on Automatic Chinese to English News Translation. arXiv 2018, arXiv:1803.05567. Available online: https://arxiv.org/abs/1803.05567 (accessed on 9 December 2019).
- Koehn, P.; Knowles, R. Six Challenges for Neural Machine Translation. In Proceedings of the First Workshop on Neural Machine Translation, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 28–39. [Google Scholar] [CrossRef] [Green Version]
- Lambert, P.; Schwenk, H.; Servan, C.; Abdul-Rauf, S. Investigations on Translation Model Adaptation Using Monolingual Data. In Proceedings of the Sixth Workshop on Statistical Machine Translation, Edinburgh, Scotland, 30–31 July 2011; pp. 284–293. [Google Scholar]
- Gulcehre, C.; Firat, O.; Xu, K.; Cho, K.; Bengio, Y. On integrating a language model into neural machine translation. Comput. Speech Lang. 2017, 45, 137–148. [Google Scholar] [CrossRef]
- Sennrich, R.; Haddow, B.; Birch, A. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 1, pp. 86–96. [Google Scholar] [CrossRef]
- Imankulova, A.; Sato, T.; Komachi, M. Improving Low-Resource Neural Machine Translation with Filtered Pseudo-Parallel Corpus. In Proceedings of the 4th Workshop on Asian Translation (WAT2017), Taipei, Taiwan, 27 November–1 December 2017; pp. 70–78. [Google Scholar]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.-J. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002; Association for Computational Linguistics: Straudsburg, PA, USA, 2002; pp. 311–318. [Google Scholar]
- Koehn, P.; Khayrallah, H.; Heafield, K.; Forcada, M.L. Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, Belgium, Brussels, 31 October–1 November 2018; pp. 726–739. [Google Scholar]
- Mikolov, T.; Le, Q.V.; Sutskever, I. Exploiting similarities among languages for machine translation. arXiv 2013, arXiv:1309.4168. Available online: https://arxiv.org/abs/1309.4168 (accessed on 9 November 2019).
- Xing, C.; Wang, D.; Liu, C.; Lin, Y. Normalized word embedding and orthogonal transform for bilingual word translation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, CO, USA, 31 May–5 June 2015; pp. 1006–1011. [Google Scholar]
- Luong, T.; Pham, H.; Manning, C.D. Bilingual word representations with monolingual quality in mind. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, Denver, CO, USA, 5 June 2015; pp. 151–159. [Google Scholar]
- Artetxe, M.; Labaka, G.; Agirre, E. Generalizing and improving bilingual word embedding mappings with a multi-step framework of linear transformations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 5012–5019. [Google Scholar]
- Artetxe, M.; Labaka, G.; Agirre, E. Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1, pp. 451–462. [Google Scholar]
- Conneau, A.; Lample, G.; Ranzato, M.; Denoyer, L.; Jégou, H. Word translation without parallel data. arXiv 2017, arXiv:1710.04087. Available online: https://arxiv.org/abs/1710.04087 (accessed on 9 December 2019).
- Artetxe, M.; Labaka, G.; Agirre, E. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. arXiv 2018, arXiv:1805.06297. Available online: https://arxiv.org/abs/1805.06297 (accessed on 9 December 2019).
- Taghipour, K.; Khadivi, S.; Xu, J. Parallel corpus refinement as an outlier detection algorithm. In Proceedings of the 13th Machine Translation Summit (MT Summit XIII), Xiamen, China, 19–23 September 2011; pp. 414–421. [Google Scholar]
- Cui, L.; Zhang, D.; Liu, S.; Li, M.; Zhou, M. Bilingual data cleaning for smt using graph-based random walk. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013; Volume 2, pp. 340–345. [Google Scholar]
- Junczys-Dowmunt, M. Dual conditional cross-entropy filtering of noisy parallel corpora. arXiv 2018, arXiv:1809.00197. Available online: https://arxiv.org/abs/1809.00197 (accessed on 9 December 2019).
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2014; pp. 3104–3112. [Google Scholar]
- Cettolo, M.; Federico, M.; Bentivogli, L.; Niehues, J.; Stüker, S.; Sudoh, K.; Yoshino, K.; Federmann, C. Overview of the IWSLT 2017 Evaluation Campaign. In Proceedings of the 14th International Workshop on Spoken Language Translation (IWSLT), Tokyo, Japan, 14–15 December 2018; pp. 2–12. [Google Scholar]
- Koehn, P.; Hoang, H.; Birch, A.; Callison-Burch, C.; Federico, M.; Bertoldi, N.; Cowan, B.; Shen, W.; Moran, C.; Zens, R.; et al. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL ’07), Stroudsburg, PA, USA, 25–27 June 2007; pp. 177–180. [Google Scholar]
- Park, E.L.; Cho, S. KoNLPy: Korean natural language processing in Python. In Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, Chuncheon, Korea, 10–11 October 2014. [Google Scholar]
- Sennrich, R.; Haddow, B.; Birch, A. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 1, pp. 1715–1725. [Google Scholar] [CrossRef]
- Klein, G.; Kim, Y.; Deng, Y.; Senellart, J.; Rush, A.M. OpenNMT: Open-Source Toolkit for Neural Machine Translation. arXiv 2017, arXiv:1701.02810. Available online: https://arxiv.org/abs/1701.02810 (accessed on 9 December 2019).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.u.; Polosukhin, I. Attention is All you Need. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 5998–6008. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Luong, T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1412–1421. [Google Scholar] [CrossRef] [Green Version]
- Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching Word Vectors with Subword Information. TACL 2017, 5, 135–146. [Google Scholar] [CrossRef] [Green Version]
- Fadaee, M.; Monz, C. Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 436–446. [Google Scholar]
Dataset | Source | Target | Sentences |
---|---|---|---|
IWSLT2017 | Korean | English | 207 K |
WMT2016 | - | English | 4.5 M |
tst2016 | Korean | English | 1024 |
tst2017 | Korean | English | 1036 |
Bilingual Mappings | Accuracy |
---|---|
Supervised | 42.60% |
Identical | 41.58% |
Unsupervised | 40.16% |
Model | Size | tst2016 | tst2017 |
---|---|---|---|
Baseline | 207 K | 14.46 | 12.84 |
+ synthetic (1:1) | 414 K | 15.55 | 13.67 |
+ synthetic (1:5) | 1.2 M | 17.44 | 15.24 |
+ synthetic (1:10) | 2.2 M | 18.01 | 15.48 |
+ synthetic (1:20) | 4.2 M | 16.26 | 14.03 |
Model | Synthetic Data | tst2016 | tst2017 |
---|---|---|---|
Sent-Bleu | Top200k | 5.70 | 5.27 |
Sent-BiEMB | Top200k | 8.34 (+2.64) | 7.33 (+2.06) |
Sent-Bleu | Top400k | 8.41 | 7.38 |
Sent-BiEMB | Top400k | 10.05 (+1.64) | 9.03 (+1.65) |
Model | Threshold | tst2016 | tst2017 |
---|---|---|---|
Baseline | None | 18.01 | 15.48 |
Sent-Bleu | 0.1 | 17.97 | 15.66 |
Sent-Bleu | 0.2 | 17.67 | 15.26 |
Sent-Bleu | 0.3 | 17.45 | 15.39 |
Sent-Bleu | 0.4 | 16.93 | 14.65 |
Sent-BiEMB | 0.1 | 18.03 | 15.73 |
Sent-BiEMB | 0.2 | 18.11 | 15.70 |
Sent-BiEMB | 0.3 | 18.73 (+0.72) | 16.10 (+0.62) |
Sent-BiEMB | 0.4 | 18.20 | 15.97 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, G.; Ko, Y.; Seo, J. Improving Neural Machine Translation by Filtering Synthetic Parallel Data. Entropy 2019, 21, 1213. https://doi.org/10.3390/e21121213
Xu G, Ko Y, Seo J. Improving Neural Machine Translation by Filtering Synthetic Parallel Data. Entropy. 2019; 21(12):1213. https://doi.org/10.3390/e21121213
Chicago/Turabian StyleXu, Guanghao, Youngjoong Ko, and Jungyun Seo. 2019. "Improving Neural Machine Translation by Filtering Synthetic Parallel Data" Entropy 21, no. 12: 1213. https://doi.org/10.3390/e21121213
APA StyleXu, G., Ko, Y., & Seo, J. (2019). Improving Neural Machine Translation by Filtering Synthetic Parallel Data. Entropy, 21(12), 1213. https://doi.org/10.3390/e21121213