Machine Translation in the Era of Large Language Models:A Survey of Historical and Emerging Problems
Abstract
1. Introduction
2. Machine Translation
2.1. Historical Approaches
2.2. Word-Based Statistical Machine Translation
2.3. Phrase-Based Machine Translation
2.4. Neural Machine Translation
2.5. Multilingual Machine Translation and Generative Language Modeling
2.6. Few-Shot Learning with Language Models
2.7. Inductive Learning and Hybrid Models
3. Evaluation
4. Data Curation Methods Used for Building MT Systems
5. Current and Emerging Problems
5.1. Applicability Across Languages
5.2. Evaluation
5.3. Biases and Hallucinations
6. LLMs and MT Together
7. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach; Pearson: London, UK, 2016. [Google Scholar]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning representations, ICLr 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv 2016, arXiv:1609.08144. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Luong, M.T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-Based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1412–1421. [Google Scholar]
- Artetxe, M.; Labaka, G.; Agirre, E.; Cho, K. Unsupervised neural machine translation. In Proceedings of the 6th International Conference on Learning Representations, ICLr 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT, New Orleans, LA, USA, 1–6 June 2018; pp. 4171–4186. [Google Scholar]
- Zhang, B.; Haddow, B.; Birch, A. Prompting large language model for machine translation: A case study. In Proceedings of the International Conference on Machine Learning, PMLr, Honolulu, HI, USA, 23–29 July 2023; pp. 41092–41110. [Google Scholar]
- Luong, M.-T.; Kayser, M.; Manning, C.D. Deep neural language models for machine translation. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning, Beijing, China, 30–31 July 2015; pp. 305–309. [Google Scholar]
- Ha, T.-L.; Niehues, J.; Waibel, A. Toward multilingual neural machine translation with universal encoder and decoder. In Proceedings of the 13th International Conference on Spoken Language Translation, Seattle, WA, USA, 8–9 December 2016. [Google Scholar]
- Tu, Z.; Liu, Y.; Shang, L.; Liu, X.; Li, H. Neural machine translation with reconstruction. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. No. 1. [Google Scholar]
- Gehring, J.; Auli, M.; Grangier, D.; Dauphin, Y. A convolutional encoder model for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 123–135. [Google Scholar]
- Gu, J.; Bradbury, J.; Xiong, C.; Li, V.O.K.; Socher, R. Non-autoregressive neural machine translation. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Ott, M.; Edunov, S.; Grangier, D.; Auli, M. Scaling neural machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers, Belgium, Brussels, 31 October–1 November 2018; pp. 1–9. [Google Scholar]
- Irie, K.; Zeyer, A.; Schlüter, R.; Ney, H. Language modeling with deep transformers. In Proceedings of the Interspeech 2019, Graz, Austria, 15–19 September 2019; pp. 3905–3909. [Google Scholar]
- Han, J.M.; Babuschkin, I.; Edwards, H.; Neelakantan, A.; Xu, T.; Polu, S.; Ray, A.; Shyam, P.; Ramesh, A.; Radford, A.; et al. Unsupervised neural machine translation with generative language models only. arXiv 2021, arXiv:2110.05448. [Google Scholar] [CrossRef]
- Zhang, B.; Ghorbani, B.; Bapna, A.; Cheng, Y.; Garcia, X.; Shen, J.; Firat, O. Examining scaling and transfer of language model architectures for machine translation. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 26176–26192. [Google Scholar]
- Guo, S.; Zhang, S.; Feng, Y. Decoder-only streaming transformer for simultaneous translation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; pp. 8851–8864. [Google Scholar]
- Sun, Y.; Dong, L.; Zhu, Y.; Huang, S.; Wang, W.; Ma, S.; Zhang, Q.; Wang, J.; Wei, F. You only cache once: Decoder-decoder architectures for language models. Adv. Neural Inf. Process. Syst. 2024, 37, 7339–7361. [Google Scholar]
- Shiwen, Y.; Xiaojing, B. Rule-based machine translation. In Routledge Encyclopedia of Translation Technology; Routledge: Abingdon, UK, 2014; pp. 186–200. [Google Scholar]
- ALPAC. Language and Machines: Computers in Translation and Linguistics: A Report; Number 1416; National Academy of Sciences, National Research Council: Washington, DC, USA, 1966.
- Chomsky, N. Syntactic Structures; Mouton de Gruyter: Berlin, Germany, 2002. [Google Scholar]
- Tanaka, H. Multilingual Machine Translation Systems in the Future. In Progress in Machine Translation; IOS Press: Washington, DC, USA, 1993. [Google Scholar]
- Dong, D.; Wu, H.; He, W.; Yu, D.; Wang, H. Multi-task learning for multiple language translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; Volume 1: Long Papers, pp. 1723–1732. [Google Scholar]
- Gasser, M. Minimal dependency translation: A framework for computer-assisted translation for under-resourced languages. In Proceedings of the Information and Communication Technology for Development for Africa: First International Conference, ICT4DA 2017, Bahir Dar, Ethiopia, 25–27 September 2017; Springer: Cham, Switzerland, 2018; pp. 209–218. [Google Scholar]
- Khanna, T.; Washington, J.N.; Tyers, F.M.; Bayatlı, S.; Swanson, D.G.; Pirinen, T.A.; Tang, I.; Alos i Font, H. recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages. Mach. Transl. 2021, 35, 475–502. [Google Scholar] [CrossRef]
- Haddow, B.; Bawden, R.; Barone, A.V.M.; Helcl, J.; Birch, A. Survey of low-resource machine translation. Comput. Linguist. 2022, 48, 673–732. [Google Scholar] [CrossRef]
- Nørstebø Moshagen, S.; Pirinen, F.; Antonsen, L.; Gaup, B.; Mikkelsen, I.L.S.; Trosterud, T.; Wiechetek, L.; Hiovain-Asikainen, K. The GiellaLT infrastructure: A multilingual infrastructure for rule-based NLP. In Rule-Based Language Technology; University of Tartu: Tartu, Estonia, 2023. [Google Scholar]
- Trieu, H.-L.; Tran, D.-V.; Nguyen, L.-M. Investigating phrase-based and neural-based machine translation on low-resource settings. In Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation, Cebu City, Philippines, 16–18 November 2017; pp. 384–391. [Google Scholar]
- Sennrich, R.; Zhang, B. Revisiting low-resource neural machine translation: A case study. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 211–221. [Google Scholar]
- Haque, R.; Liu, C.-H.; Way, A. Recent advances of low-resource neural machine translation. Mach. Transl. 2021, 35, 451–474. [Google Scholar] [CrossRef]
- Wang, R.; Tan, X.; Luo, R.; Qin, T.; Liu, T.-Y. A survey on low-resource neural machine translation. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 19–27 August 2021; pp. 4636–4643. [Google Scholar]
- Kumar, S.; Anastasopoulos, A.; Wintner, S.; Tsvetkov, Y. Machine translation into low-resource language varieties. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Virtual, 1–6 August 2021; pp. 110–121. [Google Scholar]
- Shi, S.; Wu, X.; Su, R.; Huang, H. Low-resource neural machine translation: Methods and trends. ACM Trans. Asian -Low-Resour. Lang. Inf. Process. 2022, 21, 1–22. [Google Scholar] [CrossRef]
- Ranathunga, S.; Lee, E.-S.A.; Prifti Skenduli, M.; Shekhar, R.; Alam, M.; Kaur, R. Neural machine translation for low-resource languages: A survey. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar] [CrossRef]
- Brown, P.F.; Cocke, J.; Della Pietra, S.A.; Della Pietra, V.J.; Jelinek, F.; Lafferty, J.; Mercer, R.L.; Roossin, P.S. A statistical approach to machine translation. Comput. Linguist. 1990, 16, 79–85. [Google Scholar]
- Moon, T.K. The expectation-maximization algorithm. IEEE Signal Process. Mag. 1996, 13, 47–60. [Google Scholar] [CrossRef]
- Koehn, P. Statistical Machine Translation, 1st ed.; Cambridge University Press: New York, NY, USA, 2010. [Google Scholar]
- Koehn, P.; Och, F.J.; Marcu, D. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Langauge Technology (HLT-NAACL 2003), Edmonton, AB, Canada, 27 May–1 June 2003; pp. 48–54. [Google Scholar]
- Zens, R.; Och, F.J.; Ney, H. Phrase-based statistical machine translation. In Proceedings of the Annual Conference on Artificial Intelligence, Canberra, Australia, 2–6 December 2002; pp. 18–32. [Google Scholar]
- Zens, R.; Ney, H. Improvements in phrase-based statistical machine translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, Boston, MA, USA, 6 May 2004; pp. 257–264. [Google Scholar]
- Costa-jussà, M.R. An Overview of the Phrase-based Statistical Machine Translation Techniques. Knowl. Eng. Rev. 2012, 27, 413–431. [Google Scholar] [CrossRef]
- Bisazza, A.; Federico, M. A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena. Comput. Linguist. 2016, 42, 163–205. [Google Scholar] [CrossRef]
- Zens, R.; Ney, H. A Comparative Study on Reordering Constraints in Statistical Machine Translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, 7–12 July 2003; pp. 144–151. [Google Scholar]
- Kumar, S.; Byrne, B. Local Phrase Reordering Models for Statistical Machine Translation. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, Canada, 6–8 October 2005; pp. 161–168. [Google Scholar]
- Kanthak, S.; Vilar, D.; Matusov, E.; Zens, R.; Ney, H. Novel Reordering Approaches in Phrase-Based Statistical Machine Translation. In Proceedings of the ACL Workshop on Building and Using Parallel Texts, Ann Arbor, MI, USA, 29 June 2005; pp. 167–174. [Google Scholar]
- Xiong, D.; Liu, Q.; Lin, S. Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, 17–21 July 2006; pp. 521–528. [Google Scholar]
- Zens, R.; Ney, H. Discriminative Reordering Models for Statistical Machine Translation. In Proceedings of the Workshop on Statistical Machine Translation, New York, NY, USA, 8 June 2006; pp. 55–63. [Google Scholar]
- Li, C.; Li, M.; Zhang, D.; Li, M.; Zhou, M.; Guan, Y. A Probabilistic Approach to Syntax-Based Reordering for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, 23–30 June 2007; pp. 720–727. [Google Scholar]
- Zhao, C.; Walker, M.; Chaturvedi, S. Bridging the Structural Gap Between Encoding and Decoding for Data-to-Text Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 2481–2491. [Google Scholar]
- Bentivogli, L.; Bisazza, A.; Cettolo, M.; Federico, M. Neural Versus Phrase-Based Machine Translation Quality: A Case Study. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 257–267. [Google Scholar]
- Kalchbrenner, N.; Blunsom, P. recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; pp. 1700–1709. [Google Scholar]
- Bottou, L. Large-scale machine learning with stochastic gradient descent. In COMPSTAT’2010, Proceedings of the 19th International Conference on Computational Statistics, Paris, France, 22–27 August 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 177–186. [Google Scholar]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533. [Google Scholar] [CrossRef]
- Jean, S.; Cho, K.; Memisevic, R.; Bengio, Y. On using very large target vocabulary for neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015, Beijing, China, 26–31 July 2015; Association for Computational Linguistics (ACL): Kerrville, TX, USA, 2015; pp. 1–10. [Google Scholar]
- Luong, M.T.; Manning, C.D. Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 1: Long Papers, pp. 1054–1063. [Google Scholar]
- Sennrich, R.; Haddow, B.; Birch, A. Neural Machine Translation of rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 1: Long Papers, pp. 1715–1725. [Google Scholar]
- Kudo, T.; Richardson, J. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium, 31 October–4 November 2018; pp. 66–71. [Google Scholar]
- Song, X.; Salcianu, A.; Song, Y.; Dopson, D.; Zhou, D. Fast WordPiece Tokenization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 2089–2103. [Google Scholar]
- Xu, J.; Zhou, H.; Gan, C.; Zheng, Z.; Li, L. Vocabulary Learning via Optimal Transport for Neural Machine Translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual Event, 1–6 August 2021; pp. 7361–7373. [Google Scholar]
- Wang, X.; Ruder, S.; Neubig, G. Multi-View Subword Regularization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 473–482. [Google Scholar]
- Ataman, D.; Negri, M.; Turchi, M.; Federico, M. Linguistically Motivated Vocabulary reduction for Neural Machine Translation from Turkish to English. Prague Bull. Math. Linguist. 2017, 331–342. [Google Scholar] [CrossRef]
- Araabi, A.; Monz, C.; Niculae, V. How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation? In Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), Orlando, FL, USA, 12–16 September 2022; pp. 117–130. [Google Scholar]
- Costa-jussà, M.R.; Fonollosa, J.A.R. Character-Based Neural Machine Translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, 7–12 August 2016; pp. 357–361. [Google Scholar]
- Ling, W.; Trancoso, I.; Dyer, C.; Black, A.W. Character-based neural machine translation. arXiv 2015, arXiv:1511.04586. [Google Scholar] [CrossRef]
- Lee, J.; Cho, K.; Hofmann, T. Fully character-level neural machine translation without explicit segmentation. Trans. Assoc. Comput. Linguist. 2017, 5, 365–378. [Google Scholar] [CrossRef]
- Ataman, D.; Aziz, W.; Birch, A. A Latent Morphology Model for Open-Vocabulary Neural Machine Translation. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Gallé, M. Investigating the effectiveness of BPE: The power of shorter sequences. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 1375–1381. [Google Scholar]
- Wang, C.; Cho, K.; Gu, J. Neural Machine Translation with Byte-Level Subwords. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, Number 05. pp. 9154–9160. [Google Scholar]
- Chakravarthi, B.R.; Rani, P.; Arcan, M.; McCrae, J.P. A Survey of Orthographic Information in Machine Translation. SN Comput. Sci. 2021, 2, 330. [Google Scholar] [CrossRef] [PubMed]
- Libovickỳ, J.; Schmid, H.; Fraser, A. Why Don’t People Use Character-Level Machine Translation? In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022. [Google Scholar]
- Edman, L.; Sarti, G.; Toral, A.; van Noord, G.; Bisazza, A. Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation. Trans. Assoc. Comput. Linguist. 2024, 12, 392–410. [Google Scholar] [CrossRef]
- Cherry, C.; Foster, G.; Bapna, A.; Firat, O.; Macherey, W. Revisiting Character-Based Neural Machine Translation with Capacity and Compression. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4295–4305. [Google Scholar]
- Xue, L.; Barua, A.; Constant, N.; Al-rfou, R.; Narang, S.; Kale, M.; Roberts, A.; Raffel, C. Byt5: Towards a token-free future with pre-trained byte-to-byte models. Trans. Assoc. Comput. Linguist. 2022, 10, 291–306. [Google Scholar] [CrossRef]
- Kaplan, J.; McCandlish, S.; Henighan, T.; Brown, T.B.; Chess, B.; Child, R.; Gray, S.; Radford, A.; Wu, J.; Amodei, D. Scaling laws for neural language models. arXiv 2020, arXiv:2001.08361. [Google Scholar] [CrossRef]
- Gulcehre, C.; Firat, O.; Xu, K.; Cho, K.; Bengio, Y. On integrating a language model into neural machine translation. Comput. Speech Lang. 2017, 45, 137–148. [Google Scholar] [CrossRef]
- Kannan, A.; Wu, Y.; Nguyen, P.; Sainath, T.N.; Chen, Z.; Prabhavalkar, R. An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 1–5828. [Google Scholar]
- Ueffing, N.; Haffari, G.; Sarkar, A. Transductive learning for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, 23–30 June 2007; pp. 25–32. [Google Scholar]
- Bertoldi, N.; Federico, M. Domain Adaptation for Statistical Machine Translation with Monolingual Resources. In Proceedings of the Fourth Workshop on Statistical Machine Translation, Athens, Greece, 30–31 March 2009; pp. 182–189. [Google Scholar]
- Wu, H.; Wang, H. Revisiting pivot language approach for machine translation. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 2–7 August 2009; pp. 154–162. [Google Scholar]
- Sennrich, R.; Haddow, B.; Birch, A. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 1: Long Papers, pp. 86–96. [Google Scholar]
- Burlot, F.; Yvon, F. Using Monolingual Data in Neural Machine Translation: A Systematic Study. In Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium, 31 October–1 November 2018; pp. 144–155. [Google Scholar]
- Edunov, S.; Ott, M.; Auli, M.; Grangier, D. Understanding Back-Translation at Scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 489–500. [Google Scholar]
- Wu, J.; Wang, X.; Wang, W.Y. Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1: Long and Short Papers, pp. 1173–1183. [Google Scholar]
- Graça, M.; Kim, Y.; Schamper, J.; Khadivi, S.; Ney, H. Generalizing Back-Translation in Neural Machine Translation. In Proceedings of the Fourth Conference on Machine Translation, Florence, Italy, 1–2 August 2019; Volume 1: Research Papers, pp. 45–52. [Google Scholar]
- Przystupa, M.; Abdul-Mageed, M. Neural machine translation of low-resource and similar languages with backtranslation. In Proceedings of the Fourth Conference on Machine Translation, Athens, Greece, 30–31 March 2009; pp. 224–235. [Google Scholar]
- Feldman, I.; Coto-Solano, R. Neural machine translation models with back-translation for the extremely low-resource indigenous language Bribri. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 3965–3976. [Google Scholar]
- Mikolov, T.; Le, Q.V.; Sutskever, I. Exploiting similarities among languages for machine translation. arXiv 2013, arXiv:1309.4168. [Google Scholar] [CrossRef]
- Faruqui, M.; Dyer, C. Improving vector space word representations using multilingual correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, 26–30 April 2014; pp. 462–471. [Google Scholar]
- Xing, C.; Wang, D.; Liu, C.; Lin, Y. Normalized word embedding and orthogonal transform for bilingual word translation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, CO, USA, 31 May–5 June 2015; pp. 1006–1011. [Google Scholar]
- Artetxe, M.; Labaka, G.; Agirre, E. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 2289–2294. [Google Scholar]
- Artetxe, M.; Labaka, G.; Agirre, E. Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1: Long Papers, pp. 451–462. [Google Scholar]
- Smith, S.L.; Turban, D.H.; Hamblin, S.; Hammerla, N.Y. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. arXiv 2017, arXiv:1702.03859. [Google Scholar] [CrossRef]
- Ravi, S.; Knight, K. Deciphering foreign language. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 12–21. [Google Scholar]
- Hermann, K.M.; Blunsom, P. Multilingual Distributed representations without Word Alignment. In Proceedings of the International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Lample, G.; Conneau, A.; Denoyer, L.; Ranzato, M. Unsupervised Machine Translation Using Monolingual Corpora Only. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Yang, Z.; Chen, W.; Wang, F.; Xu, B. Unsupervised Neural Machine Translation with Weight Sharing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; Volume 1: Long Papers, pp. 46–55. [Google Scholar]
- Graça, Y.K.M.; Ney, H. When and why is unsupervised neural machine translation useless. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, Lisbon, Portugal, 3–5 November 2020; p. 35. [Google Scholar]
- Kauers, M.; Vogel, S.; Fügen, C.; Waibel, A. Interlingua based statistical machine translation. In Proceedings of the INTErSPEECH, Denver, CO, USA, 16–20 September 2002; pp. 1909–1912. [Google Scholar]
- De Gispert, A.; Marino, J.B. Catalan-English statistical machine translation without parallel corpus: Bridging through Spanish. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LrEC), Genoa, Italy, 22–28 May 2006; pp. 65–68. [Google Scholar]
- Utiyama, M.; Isahara, H. A comparison of pivot methods for phrase-based statistical machine translation. In Proceedings of the Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference, Rochester, NY, USA, 22–27 April 2007; pp. 484–491. [Google Scholar]
- Wu, H.; Wang, H. Pivot language approach for phrase-based statistical machine translation. Mach. Transl. 2007, 21, 165–181. [Google Scholar] [CrossRef]
- Bertoldi, N.; Barbaiani, M.; Federico, M.; Cattoni, R. Phrase-based statistical machine translation with pivot languages. In Proceedings of the 5th International Workshop on Spoken Language Translation: Papers, Waikiki, HI, USA, 20–21 October 2008; pp. 143–149. [Google Scholar]
- Chen, Y.; Liu, Y.; Cheng, Y.; Li, V.O. A Teacher-Student Framework for Zero-Resource Neural Machine Translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1: Long Papers, pp. 1925–1935. [Google Scholar]
- Leng, Y.; Tan, X.; Qin, T.; Li, X.Y.; Liu, T.Y. Unsupervised Pivot Translation for Distant Languages. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 175–183. [Google Scholar]
- Johnson, M.; Schuster, M.; Le, Q.V.; Krikun, M.; Wu, Y.; Chen, Z.; Thorat, N.; Viégas, F.; Wattenberg, M.; Corrado, G.; et al. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 2017, 5, 339–351. [Google Scholar] [CrossRef]
- Lample, G.; Ott, M.; Conneau, A.; Denoyer, L.; Ranzato, M. Phrase-Based & Neural Unsupervised Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 5039–5049. [Google Scholar]
- Lu, Y.; Keung, P.; Ladhak, F.; Bhardwaj, V.; Zhang, S.; Sun, J. A neural interlingua for multilingual machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium, 31 October–1 November 2018; pp. 84–92. [Google Scholar]
- Lakew, S.M.; Cettolo, M.; Federico, M. A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; pp. 641–652. [Google Scholar]
- Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzmán, F.; Grave, É.; Ott, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised Cross-lingual representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 8440–8451. [Google Scholar]
- Karthikeyan, K.; Wang, Z.; Mayhew, S.; Roth, D. Cross-lingual ability of multilingual bert: An empirical study. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Kudugunta, S.; Bapna, A.; Caswell, I.; Firat, O. Investigating Multilingual NMT Representations at Scale. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 1565–1575. [Google Scholar]
- Wang, X.; Pham, H.; Arthur, P.; Neubig, G. Multilingual Neural Machine Translation with Soft Decoupled Encoding. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Kim, Y.; Gao, Y.; Ney, H. Effective Cross-Lingual Transfer of Neural Machine Translation Models Without Shared Vocabularies. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1246–1257. [Google Scholar]
- Zhu, C.; Yu, H.; Cheng, S.; Luo, W. Language-aware interlingua for multilingual neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 1650–1655. [Google Scholar]
- Baziotis, C.; Artetxe, M.; Cross, J.; Bhosale, S. Multilingual Machine Translation with Hyper-Adapters. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 1170–1185. [Google Scholar]
- Nguyen, X.P.; Joty, S.; Wu, K.; Aw, A.T. refining low-resource unsupervised translation by language disentanglement of multilingual translation model. Adv. Neural Inf. Process. Syst. 2022, 35, 36230–36242. [Google Scholar]
- Costa-jussà, M.R.; Cross, J.; Çelebi, O.; Elbayad, M.; Heafield, K.; Heffernan, K.; Kalbassi, E.; Lam, J.; Licht, D.; Maillard, J.; et al. No Language Left Behind: Scaling Human-Centered Machine Translation. arXiv 2022, arXiv:2207.04672. [Google Scholar] [CrossRef]
- Zanr, C.; Peng, K.; Dingr, L.; Qiu, B.; Liu, B.; He, S.; Lu, Q.; Zhang, Z.; Liu, C.; Liu, W.; et al. Vega-MT: The JD Explore Academy Translation System for WMT22. arXiv 2022, arXiv:2209.09444. [Google Scholar]
- Liu, Y.; Gu, J.; Goyal, N.; Li, X.; Edunov, S.; Ghazvininejad, M.; Lewis, M.; Zettlemoyer, L. Multilingual denoising pre-training for neural machine translation. Trans. Assoc. Comput. Linguist. 2020, 8, 726–742. [Google Scholar] [CrossRef]
- Edunov, S.; Baevski, A.; Auli, M. Pre-trained language model representations for language generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1: Long and Short Papers, pp. 4052–4059. [Google Scholar]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; pp. 2227–2237. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 5485–5551. [Google Scholar]
- Yang, Z. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv 2019, arXiv:1906.08237. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models are Unsupervised Multitask Learners; OpenAI: San Francisco, CA, USA, 2019. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res. 2023, 24, 1–113. [Google Scholar]
- Le Scao, T.; Fan, A.; Akiki, C.; Pavlick, E.; Ilić, S.; Hesslow, D.; Castagné, R.; Luccioni, A.S.; Yvon, F.; Gallé, M.; et al. Bloom: A 176b-parameter open-access multilingual language model. arXiv 2023, arXiv:2211.05100. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
- Bawden, R.; Birch, A.; Dobreva, R.; Oncevay, A.; Barone, A.V.M.; Williams, P. The University of Edinburgh’s English-Tamil and English-Inuktitut submissions to the WMT20 news translation task. In Proceedings of the 5th Conference on Machine Translation, Online, 19–20 November 2020. [Google Scholar]
- Baziotis, C.; Haddow, B.; Birch, A. Language Model Prior for Low-Resource Neural Machine Translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 7622–7634. [Google Scholar]
- Baziotis, C.; Titov, I.; Birch, A.; Haddow, B. Exploring Unsupervised Pretraining Objectives for Machine Translation. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 1–6 August 2021; pp. 2956–2971. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Zeng, A.; Liu, X.; Du, Z.; Wang, Z.; Lai, H.; Ding, M.; Yang, Z.; Xu, Y.; Zheng, W.; Xia, X.; et al. GLM-130B: An Open Bilingual Pre-Trained Model. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Aynetdinov, A.; Akbik, A. SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity. arXiv 2024, arXiv:2401.17072. [Google Scholar]
- Zhang, Y.; Zhang, P.; Yan, Y. Language model score regularization for speech recognition. Chin. J. Electron. 2019, 28, 604–609. [Google Scholar] [CrossRef]
- Hendy, A.; Abdelrehim, M.; Sharaf, A.; Raunak, V.; Gabr, M.; Matsushita, H.; Kim, Y.J.; Afify, M.; Awadalla, H.H. How good are gpt models at machine translation? A comprehensive evaluation. arXiv 2023, arXiv:2302.09210. [Google Scholar] [CrossRef]
- Kocmi, T.; Avramidis, E.; Bawden, R.; Bojar, O.; Dvorkovich, A.; Federmann, C.; Fishel, M.; Freitag, M.; Gowda, T.; Grundkiewicz, R.; et al. Findings of the 2023 conference on machine translation (WMT23): LLMs are here but not quite there yet. In Proceedings of the WMT23-Eighth Conference on Machine Translation, Singapore, 6–7 December 2023; pp. 198–216. [Google Scholar]
- Xu, H.; Kim, Y.J.; Sharaf, A.; Awadalla, H.H. A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Zhu, W.; Liu, H.; Dong, Q.; Xu, J.; Huang, S.; Kong, L.; Chen, J.; Li, L. Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, Mexico, 16–21 June 2024; pp. 2765–2781. [Google Scholar]
- Jelinek, F. Applying Information Theoretic Methods: Evaluation of Grammar Quality. In Proceedings of the Workshop on Evaluation of NLP Systems, Wayne, Pennsylvania, 7–9 December 1988. [Google Scholar]
- Thorndike, E.L. The Fundamentals of Learning; Teachers College, Columbia University: New York, NY, USA, 1932. [Google Scholar]
- McCulloch, W.S.; Pitts, W. A Logical Calculus of the Ideas Immanent in Nervous Activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
- Hebb, D.O. The Organization of Behavior: A Neuropsychological Theory; Wiley: New York, NY, USA, 1949. [Google Scholar]
- Rosenblatt, F. The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain; Technical Report 85-460-1; Cornell Aeronautical Laboratory: Buffalo, NY, USA, 1958. [Google Scholar]
- Dreyfus, H.; Dreyfus, S.E. Mind over Machine; Simon and Schuster: New York, NY, USA, 1986. [Google Scholar]
- Chomsky, N. Aspects of the Theory of Syntax; Number 11; MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
- Fodor, J.A.; Pylyshyn, Z.W. Connectionism and cognitive architecture: A critical analysis. Cognition 1988, 28, 3–71. [Google Scholar] [CrossRef]
- Marcus, G.F. Rethinking eliminative connectionism. Cogn. Psychol. 1998, 37, 243–282. [Google Scholar] [CrossRef]
- Marcus, G.F. The Algebraic Mind: Integrating Connectionism and Cognitive Science; MIT Press: Cambridge, MA, USA, 2003. [Google Scholar]
- Fodor, J.A.; Lepore, E. The Compositionality Papers; Oxford University Press: Oxford, UK, 2002. [Google Scholar]
- Calvo, P.; Symons, J. The Architecture of Cognition: Rethinking Fodor and Pylyshyn’s Systematicity Challenge; MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
- Lake, B.M.; Ullman, T.D.; Tenenbaum, J.B.; Gershman, S.J. Building machines that learn and think like people. Behav. Brain Sci. 2017, 40, e253. [Google Scholar] [CrossRef] [PubMed]
- Lake, B.; Baroni, M. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In Proceedings of the International Conference on Machine Learning, PMLr, Stockholm, Sweden, 10–15 July 2018; pp. 2873–2882. [Google Scholar]
- Battaglia, P.W.; Hamrick, J.B.; Bapst, V.; Sanchez-Gonzalez, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; et al. relational inductive biases, deep learning, and graph networks. arXiv 2018, arXiv:1806.01261. [Google Scholar] [CrossRef]
- Snyder, B.; Barzilay, R. Unsupervised Multilingual Learning. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2010. [Google Scholar]
- Agić, Ž.; Johannsen, A.; Plank, B.; Alonso, H.M.; Schluter, N.; Søgaard, A. Multilingual projection for parsing truly low-resource languages. Trans. Assoc. Comput. Linguist. 2016, 4, 301–312. [Google Scholar] [CrossRef]
- Şahin, G.G.; Steedman, M. Data Augmentation via Dependency Tree Morphing for Low-Resource Languages. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar]
- Xia, M.; Kong, X.; Anastasopoulos, A.; Neubig, G. Generalized Data Augmentation for Low-Resource Translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Volume 57. [Google Scholar]
- Zhu, Y.; Heinzerling, B.; Vulić, I.; Strube, M.; Reichart, R.; Korhonen, A. On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), Florence, Italy, 28 July–2 August 2019; pp. 216–226. [Google Scholar]
- Ponti, E.M.; O’horan, H.; Berzak, Y.; Vulić, I.; Reichart, R.; Poibeau, T.; Shutova, E.; Korhonen, A. Modeling language variation and universals: A survey on typological linguistics for natural language processing. Comput. Linguist. 2019, 45, 559–601. [Google Scholar] [CrossRef]
- Liu, Z.; Prud’Hommeaux, E. Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation. Trans. Assoc. Comput. Linguist. 2022, 10, 393–413. [Google Scholar] [CrossRef]
- Carl, M. Inducing translation templates for example-based machine translation. In Proceedings of the Machine Translation Summit VII, Singapore, 13–17 September 1999; pp. 250–258. [Google Scholar]
- Brown, R.D. Adding linguistic knowledge to a lexical example-based translation system. In Proceedings of the 8th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Chester, UK, 23–25 August 1999. [Google Scholar]
- Wu, D. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput. Linguist. 1997, 23, 377–403. [Google Scholar]
- Chiang, D. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, MI, USA, 25–30 June 2005; pp. 263–270. [Google Scholar]
- Gildea, D. Loosely tree-based alignment for machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, 7–12 July 2003; pp. 80–87. [Google Scholar]
- Yamada, K.; Knight, K. A syntax-based statistical translation model. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, 9–11 July 2001; pp. 523–530. [Google Scholar]
- Galley, M.; Manning, C.D. Quadratic-time dependency parsing for machine translation. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 2–7 August 2009; pp. 773–781. [Google Scholar]
- Liu, Y.; Liu, Q.; Lin, S. Tree-to-string alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, 17–21 July 2006; pp. 609–616. [Google Scholar]
- Huang, L.; Knight, K.; Joshi, A. Statistical syntax-directed translation with extended domain of locality. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, Cambridge, MA, USA, 8–12 August 2006; pp. 66–73. [Google Scholar]
- Mi, H.; Huang, L. Forest-based translation rule extraction. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, HI, USA, 25–27 October 2008; pp. 206–214. [Google Scholar]
- Eisner, J. Learning non-isomorphic tree mappings for machine translation. In Proceedings of the Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, 7–12 July 2003; pp. 205–208. [Google Scholar]
- Menezes, A.; Quirk, C. Dependency Treelet Translation: The convergence of statistical and example-based machine-translation? In Proceedings of the Workshop on Example-Based Machine Translation, Phuket, Thailand, 13–15 September 2005; pp. 99–108. [Google Scholar]
- Zhang, M.; Jiang, H.; Li, H.; Aw, A.; Li, S. Grammar comparison study for translational equivalence modeling and statistical machine translation. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), Manchester, UK, 18–22 August 2008; pp. 1097–1104. [Google Scholar]
- Chiang, D. Learning to translate with source and target syntax. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 11–16 July 2010; pp. 1443–1452. [Google Scholar]
- Toutanova, K.; Suzuki, H.; Ruopp, A. Applying morphology generation models to machine translation. In Proceedings of the ACL-08: HLT, Columbus, OH, USA, 15–20 June 2008; pp. 514–522. [Google Scholar]
- Bisazza, A.; Federico, M. Morphological pre-processing for Turkish to English statistical machine translation. In Proceedings of the 6th International Workshop on Spoken Language Translation: Papers, Tokyo, Japan, 1–2 December 2009; pp. 129–135. [Google Scholar]
- El Kholy, A.; Habash, N. Orthographic and morphological processing for English–Arabic statistical machine translation. Mach. Transl. 2012, 26, 25–45. [Google Scholar] [CrossRef]
- Herzig, J.; Shaw, P.; Chang, M.W.; Guu, K.; Pasupat, P.; Zhang, Y. Unlocking compositional generalization in pre-trained models using intermediate representations. arXiv 2021, arXiv:2104.07478. [Google Scholar] [CrossRef]
- Eyigöz, E.; Gildea, D.; Oflazer, K. Simultaneous word-morpheme alignment for statistical machine translation. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA, 9–14 June 2013; pp. 32–40. [Google Scholar]
- Sánchez-Cartagena, V.M.; Toral, A. Abu-MaTran at WMT 2016 Translation Task: Deep Learning, Morphological Segmentation and Tuning on Character Sequences. In Proceedings of the 1st Conference on Machine Translation ACL, Berlin, Germany, 11–12 August 2016. [Google Scholar]
- Pirinen, T.A. Omorfi-Free and open source morphological lexical database for Finnish. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), Vilnius, Lithuania, 11–13 May 2015; pp. 313–315. [Google Scholar]
- Smit, P.; Virpioja, S.; Grönroos, S.A.; Kurimo, M. Morfessor 2.0: Toolkit for statistical morphological segmentation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Gothenburg, Sweden, 26–30 April 2014; Aalto University: Espoo, Sweden, 2014. [Google Scholar]
- Huck, M.; Riess, S.; Fraser, A. Target-Side Word Segmentation Strategies for Neural Machine Translation. In Proceedings of the Second Conference on Machine Translation (WMT), Copenhagen, Denmark, 7–8 September 2017; pp. 56–67. [Google Scholar]
- Tamchyna, A.; Marco, M.W.D.; Fraser, A. Modeling Target-Side Inflection in Neural Machine Translation. In Proceedings of the Second Conference on Machine Translation (WMT), Copenhagen, Denmark, 7–8 September 2017; pp. 32–42. [Google Scholar]
- Ataman, D.; Federico, M. An evaluation of two vocabulary reduction methods for neural machine translation. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas, Boston, MA, USA, 17–21 March 2018; Volume 1: Research Track, pp. 97–110. [Google Scholar]
- Sennrich, R. How Grammatical is Character-Level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, 3–7 April 2017; Volume 2: Short Papers, pp. 376–382. [Google Scholar]
- Ataman, D.; Firat, O.; Di Gangi, M.A.; Federico, M.; Birch, A. On the Importance of Word Boundaries in Character-Level Neural Machine Translation. In Proceedings of the 3rd Workshop on Neural Generation and Translation, Hong Kong, China, 4 November 2019; pp. 187–193. [Google Scholar]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 6–12 July 2002; pp. 311–318. [Google Scholar]
- Snover, M.; Dorr, B.; Schwartz, R.; Micciulla, L.; Makhoul, J. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, Cambridge, MA, USA, 8–12 August 2006; pp. 223–231. [Google Scholar]
- Doddington, G. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the Second International Conference on Human Language Technology Research, San Diego, CA, USA, 24–27 March 2002; pp. 138–145. [Google Scholar]
- Hutchins, J. Machine translation: A concise history. Comput. Aided Transl. Theory Pract. 2007, 13, 11. [Google Scholar]
- Culy, C.; Riehemann, S.Z. The limits of N-gram translation evaluation metrics. In Proceedings of the Machine Translation Summit IX: Papers, New Orleans, LA, USA, 18–22 September 2003. [Google Scholar]
- Callison-Burch, C.; Osborne, M.; Koehn, P. re-evaluating the role of BLEU in machine translation research. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, 3–7 April 2006; pp. 249–256. [Google Scholar]
- Birch, A.; Osborne, M.; Blunsom, P. Metrics for MT evaluation: Evaluating reordering. Mach. Transl. 2010, 24, 15–26. [Google Scholar] [CrossRef]
- Mathur, N.; Baldwin, T.; Cohn, T. Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 4984–4997. [Google Scholar]
- Popović, M. chrF: Character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, 17–18 September 2015; pp. 392–395. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating Text Generation with BERT. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Jawahar, G.; Sagot, B.; Seddah, D. What does BErT learn about the structure of language? In Proceedings of the ACL 2019-57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019. [Google Scholar]
- Chi, E.A.; Hewitt, J.; Manning, C.D. Finding Universal Grammatical Relations in Multilingual BErT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 5564–5577. [Google Scholar]
- Tian, Y.; Xia, F.; Song, Y. Large Language Models Are No Longer Shallow Parsers. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 11–16 August 2024; Volume 1: Long Papers, pp. 7131–7142. [Google Scholar]
- Urbizu, G.; Zulaika, M.; Saralegi, X.; Corral, A. How Well Can BErT Learn the Grammar of an Agglutinative and Flexible-Order Language? The Case of Basque. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LrEC-COLING 2024), Torino, Italy, 20–25 May 2024; pp. 8334–8348. [Google Scholar]
- Koehn, P. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the 10th Machine Translation Summit, Phuket, Thailand, 13–15 September 2005. [Google Scholar]
- Ziemski, M.; Junczys-Dowmunt, M.; Pouliquen, B. The United Nations Parallel Corpus v1.0. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, 23–28 May 2016; pp. 3530–3534. [Google Scholar]
- Moore, R.C.; Lewis, W.D. Intelligent selection of language model training data. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 11–16 July 2010; pp. 220–224. [Google Scholar]
- Cuong, H.; Sima’an, K. Latent domain translation models in mix-of-domains haystack. In Proceedings of the COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, 23–29 August 2014; pp. 1928–1939. [Google Scholar]
- Huang, F. Confidence measure for word alignment. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing (ACL-IJCNLP), Association for Computational Linguistics, Singapore, 2–7 August 2009; pp. 932–940. [Google Scholar]
- Smith, J.R.; Saint-Amand, H.; Plamada, M.; Koehn, P.; Callison-Burch, C.; Lopez, A. Dirt cheap web-scale parallel text from the common crawl. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013. [Google Scholar]
- Esplà-Gomis, M.; Forcada, M.L.; Sánchez-Martínez, F. ParaCrawl: Web-scale acquisition of parallel corpora. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, Lisboa, Portugal, 3–5 November 2020; pp. 35–42. [Google Scholar]
- Schwenk, H.; Wenzek, G.; Edunov, S.; Grave, E.; Joulin, A. CCMatrix: Mining billions of high-quality parallel sentences on the web. arXiv 2019, arXiv:1911.04944. [Google Scholar]
- Artetxe, M.; Schwenk, H. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Trans. Assoc. Comput. Linguist. 2019, 7, 597–610. [Google Scholar] [CrossRef]
- Feng, F.; Yang, Y.; Cer, D.; Arivazhagan, N.; Wang, W. Language-agnostic BERT sentence embedding. arXiv 2020, arXiv:2007.01852. [Google Scholar]
- NLLB Team. Scaling neural machine translation to 200 languages. Nature 2024, 630, 841–846. [Google Scholar] [CrossRef] [PubMed]
- He, P.; Liu, X.; Gao, J.; Chen, W. DeBERTa: Decoding-enhanced BERT with disentangled attention. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Zhang, J.; Zong, C. Forward Translation for Improvements in Neural Machine Translation. In Proceedings of the 2016 International Conference on Computational Linguistics and Natural Language Processing, Osaka, Japan, 11–16 December 2016. [Google Scholar]
- Axelrod, A.; He, X.; Gao, J. Domain adaptation via pseudo in-domain data selection. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Edinburgh, UK, 27–31 July 2011; pp. 355–362. [Google Scholar]
- Peris, Á.; Chinea-Ríos, M.; Casacuberta, F. Neural networks classifier for data selection in statistical machine translation. Prague Bull. Math. Linguist. 2016, 108, 283–294. [Google Scholar] [CrossRef]
- Arivazhagan, N.; Bapna, A.; Firat, O.; Aharoni, R.; Johnson, M.; Macherey, W. Massively multilingual neural machine translation in the wild: Findings and challenges. arXiv 2019, arXiv:1907.05019. [Google Scholar] [CrossRef]
- Wang, Y.; Neubig, G. Target conditioned sampling: Optimizing data selection for multilingual neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 582–592. [Google Scholar]
- Fan, A.; Bhosale, S.; Schwenk, H.; Ma, Z.; El-Kishky, A.; Goyal, S.; Baines, M.; Celebi, O.; Wenzek, G.; Chaudhary, V.; et al. Beyond English-centric multilingual machine translation. J. Mach. Learn. Res. 2021, 22, 107. [Google Scholar]
- Tan, X.; Ren, Y.; He, D.; Qin, T.; Xu, W.; Liu, T.Y. Multilingual neural machine translation with language clustering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 963–973. [Google Scholar]
- Xue, L.; Constant, N.; Roberts, A.; Kale, M.; Al-Rfou, R.; Siddhant, A.; Barua, A.; Raffel, C. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 483–498. [Google Scholar]
- Kim, Y.; Rush, A.M. Sequence-level knowledge distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, TX, USA, 1–5 November 2016; pp. 1317–1327. [Google Scholar]
- Wang, S.; Liu, Y.; Wang, C.; Luan, H.; Sun, M. Improving Back-Translation with Uncertainty-Based Confidence Estimation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 791–802. [Google Scholar]
- Goyal, N.; Chaudhary, V.; Gu, J.; Wenzek, G.; El-Kishky, A.; Gao, C.; Chen, P.-J.; Ju, D.; Krishnan, S.; Ranzato, M.; et al. The FLORES-101 evaluation benchmark for low-resource and multilingual translation. Trans. Assoc. Comput. Linguist. 2022, 10, 522–538. [Google Scholar] [CrossRef]
- Zhang, B.; Williams, P.; Titov, I.; Sennrich, R. Improving massively multilingual neural machine translation and zero-shot translation. arXiv 2020, arXiv:2004.11867. [Google Scholar] [CrossRef]
- Ahuja, S.; Aggarwal, D.; Gumma, V.; Watts, I.; Sathe, A.; Ochieng, M.; Hada, R.; Jain, P.; Ahmed, M.; Bali, K.; et al. MEGAVErSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico City, Mexico, 16–21 June 2024; Volume 1: Long Papers, pp. 2598–2637. [Google Scholar]
- Choudhury, M. Generative AI has a language problem. Nat. Hum. Behav. 2023, 7, 1802–1803. [Google Scholar] [CrossRef] [PubMed]
- Gurgurov, D.; Bäumel, T.; Anikina, T. Multilingual Large Language Models and Curse of Multilinguality. arXiv 2024, arXiv:2406.10602. [Google Scholar] [CrossRef]
- Bafna, N.; Murray, K.; Yarowsky, D. Evaluating Large Language Models Along Dimensions of Language Variation: A Systematik Invesdigatiom uv Cross-Lingual Generalization. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 18742–18762. [Google Scholar]
- Mirzakhalov, J.; Babu, A.; Ataman, D.; Kariev, S.; Tyers, F.; Abduraufov, O.; Hajili, M.; Ivanova, S.; Khaytbaev, A.; Laverghetta, A., Jr.; et al. A Large-Scale Study of Machine Translation in Turkic Languages. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 5876–5890. [Google Scholar]
- Emezue, C.C.; Dossou, B.F. MMTAfrica: Multilingual Machine Translation for African Languages. In Proceedings of the Sixth Conference on Machine Translation, Online, 10–11 November 2021; pp. 398–411. [Google Scholar]
- Gala, J.P.; Chitale, P.A.; Raghavan, A.; Gumma, V.; Doddapaneni, S.; Aswanth, K.M.; Nawale, J.A.; Sujatha, A.; Puduppully, R.; Raghavan, V.; et al. IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages. Trans. Mach. Learn. Res. 2023, 2023, 90. [Google Scholar]
- Weller-Di Marco, M.; Fraser, A. Findings of the WMT 2022 shared tasks in unsupervised MT and very low resource supervised MT. In Proceedings of the Seventh Conference on Machine Translation (WMT), Abu Dhabi, United Arab Emirates, 7–8 December 2022; pp. 801–805. [Google Scholar]
- Ahmad, I.S.; Anastasopoulos, A.; Bojar, O.; Borg, C.; Carpuat, M.; Cattoni, R.; Cettolo, M.; Chen, W.; Dong, Q.; Federico, M.; et al. Findings of the IWSLT 2024 Evaluation Campaign. In Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024), Bangkok, Thailand, 15–16 August 2024; pp. 1–11. [Google Scholar]
- Wenzek, G.; Chaudhary, V.; Fan, A.; Gomez, S.; Goyal, N.; Jain, S.; Kiela, D.; Thrush, T.; Guzmán, F. Findings of the WMT 2021 shared task on large-scale multilingual machine translation. In Proceedings of the Sixth Conference on Machine Translation, Online, 10–11 November 2021; pp. 89–99. [Google Scholar]
- Cettolo, M.; Federico, M.; Bentivogli, L.; Niehues, J.; Stüker, S.; Sudoh, K.; Yoshino, K.; Federmann, C. Overview of the IWSLT 2017 evaluation campaign. In Proceedings of the 14th International Workshop on Spoken Language Translation, Tokyo, Japan, 14–15 December 2017; pp. 2–14. [Google Scholar]
- Niehues, J.; Cattoni, R.; Stüker, S.; Cettolo, M.; Turchi, M.; Federico, M. The IWSLT 2018 Evaluation Campaign. In Proceedings of the 15th International Conference on Spoken Language Translation, Bruges, Belgium, 29–30 October 2018; pp. 2–6. [Google Scholar]
- Fraser, A. Findings of the WMT 2020 shared tasks in unsupervised MT and very low resource supervised MT. In Proceedings of the Fifth Conference on Machine Translation, Uppsala, Sweden, 15–16 July 2010; pp. 765–771. [Google Scholar]
- Libovickỳ, J.; Fraser, A. Findings of the WMT 2021 shared tasks in unsupervised MT and very low resource supervised MT. In Proceedings of the Sixth Conference on Machine Translation, Online, 10–11 November 2021; pp. 726–732. [Google Scholar]
- Sant, J. Multilingual Low-Resource Translation for Indo-European Languages. Bachelor’s Thesis, University of Malta, Msida, Malta, 2022. [Google Scholar]
- Adelani, D.I.; Alam, M.M.I.; Anastasopoulos, A.; Bhagia, A.; Costa-Jussà, M.R.; Dodge, J.; Faisal, F.; Federmann, C.; Fedorova, N.; Guzmán, F.; et al. Findings of the WMT’22 shared task on large-scale machine translation evaluation for African languages. In Proceedings of the Seventh Conference on Machine Translation (WMT), Abu Dhabi, United Arab Emirates, 7–8 December 2022; pp. 773–800. [Google Scholar]
- Pal, S.; Pakray, P.; Laskar, S.R.; Laitonjam, L.; Khenglawt, V.; Warjri, S.; Dadure, P.K.; Dash, S.K. Findings of the WMT 2023 shared task on low-resource Indic language translation. In Proceedings of the Eighth Conference on Machine Translation, Singapore, 6–7 December 2023; pp. 682–694. [Google Scholar]
- Dabre, R.; Kunchukuttan, A. Findings of wmt 2024’s multiindic22mt shared task for machine translation of 22 indian languages. In Proceedings of the Ninth Conference on Machine Translation, Miami, FL, USA, 15–16 November 2024; pp. 669–676. [Google Scholar]
- Sánchez-Martínez, F.; Pérez-Ortiz, J.A.; Galiano-Jiménez, A.; Oliver, A. Findings of the WMT 2024 Shared Task Translation into Low-Resource Languages of Spain: Blending Rule-Based and Neural Systems. In Proceedings of the Ninth Conference on Machine Translation, Miami, FL, USA, 15–16 November 2024; pp. 684–698. [Google Scholar]
- Pakray, P.; Pal, S.; Vetagiri, A.; Krishna, R.; Maji, A.K.; Dash, S.; Laitonjam, L.; Sarah, L.; Manna, R. Findings of wmt 2024 shared task on low-resource indic languages translation. In Proceedings of the Ninth Conference on Machine Translation, Miami, FL, USA, 15–16 November 2024; pp. 654–668. [Google Scholar]
- Anastasopoulos, A.; Barrault, L.; Bentivogli, L.; Bojar, O.; Cattoni, R.; Currey, A.; Dinu, G.; Duh, K.; Elbayad, M.; Emmanuel, C.; et al. Findings of the IWSLT 2022 Evaluation Campaign. In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022), Association for Computational Linguistics, Dublin, Ireland, 26–27 May 2022; pp. 98–157. [Google Scholar]
- Iyer, V.; Malik, B.; Zhu, W.; Stepachev, P.; Chen, P.; Haddow, B.; Birch-Mayne, A. Exploring very low-resource translation with LLMs: The University of Edinburgh’s submission to AmericasNLP 2024 translation task. In Proceedings of the 4th Workshop on NLP for Indigenous Languages of the Americas, Association for Computational Linguistics (ACL), Mexico City, Mexico, 21 June 2024; pp. 209–220. [Google Scholar]
- Singh, S.; Vargus, F.; D’souza, D.; Karlsson, B.; Mahendiran, A.; Ko, W.Y.; Shandilya, H.; Patel, J.; Mataciunas, D.; O’Mahony, L.; et al. Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 11–16 August 2024; Volume 1: Long Papers, pp. 11521–11567. [Google Scholar]
- Tanzer, G.; Suzgun, M.; Visser, E.; Jurafsky, D.; Melas-Kyriazi, L. A Benchmark for Learning to Translate a New Language from One Grammar Book. In Proceedings of the Twelfth International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Tran, K.M.; Bisazza, A.; Monz, C. The Importance of Being Recurrent for Modeling Hierarchical Structure. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4731–4736. [Google Scholar]
- Zhu, H.; Liang, Y.; Xu, W.; Xu, H. Evaluating Large Language Models for In-Context Learning of Linguistic Patterns in Unseen Low resource Languages. In Proceedings of the First Workshop on Language Models for Low-Resource Languages, Abu Dhabi, United Arab Emirates, 20 January 2025; pp. 414–426. [Google Scholar]
- Zhang, K.; Choi, Y.M.; Song, Z.; He, T.; Wang, W.Y.; Li, L. Hire a Linguist!: Learning Endangered Languages with In-Context Linguistic Descriptions. arXiv 2024, arXiv:2402.18025. [Google Scholar] [CrossRef]
- Zerva, C.; Blain, F.; Rei, R.; Lertvittayakumjorn, P.; De Souza, J.G.; Eger, S.; Kanojia, D.; Alves, D.; Orǎsan, C.; Fomicheva, M.; et al. Findings of the WMT 2022 shared task on quality estimation. In Proceedings of the Seventh Conference on Machine Translation (WMT), Abu Dhabi, United Arab Emirates, 7–8 December 2022; pp. 69–99. [Google Scholar]
- Freitag, M.; Rei, R.; Mathur, N.; Lo, C.K.; Stewart, C.; Avramidis, E.; Kocmi, T.; Foster, G.; Lavie, A.; Martins, A.F. results of WMT22 metrics shared task: Stop using BLEU–neural metrics are better and more robust. In Proceedings of the Seventh Conference on Machine Translation (WMT), Abu Dhabi, United Arab Emirates, 7–8 December 2022; pp. 46–68. [Google Scholar]
- Freitag, M.; Mathur, N.; Lo, C.K.; Avramidis, E.; Rei, R.; Thompson, B.; Kocmi, T.; Blain, F.; Deutsch, D.; Stewart, C.; et al. Results of WMT23 metrics shared task: Metrics might be guilty but references are not innocent. In Proceedings of the Eighth Conference on Machine Translation, Singapore, 6–7 December 2023; pp. 578–628. [Google Scholar]
- Freitag, M.; Mathur, N.; Deutsch, D.; Lo, C.K.; Avramidis, E.; Rei, R.; Thompson, B.; Blain, F.; Kocmi, T.; Wang, J.; et al. Are LLMs breaking MT metrics? results of the WMT24 metrics shared task. In Proceedings of the Ninth Conference on Machine Translation, Miami, FL, USA, 15–16 November 2024; pp. 47–81. [Google Scholar]
- Liu, C.; Dahlmeier, D.; Ng, H.T. Better evaluation metrics lead to better machine translation. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, 27–29 July 2011; pp. 375–384. [Google Scholar]
- Sai, A.B.; Mohankumar, A.K.; Khapra, M.M. A survey of evaluation metrics used for NLG systems. ACM Comput. Surv. (CSUr) 2022, 55, 1–39. [Google Scholar] [CrossRef]
- Moghe, N.; Sherborne, T.; Steedman, M.; Birch, A. Extrinsic Evaluation of Machine Translation Metrics. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023; Volume 1: Long Papers, pp. 13060–13078. [Google Scholar]
- Kreutzer, J.; Caswell, I.; Wang, L.; Wahab, A.; van Esch, D.; Ulzii-Orshikh, N.; Tapo, A.; Subramani, N.; Sokolov, A.; Sikasote, C.; et al. Quality at a glance: An audit of web-crawled multilingual datasets. Trans. Assoc. Comput. Linguist. 2022, 10, 50–72. [Google Scholar] [CrossRef]
- Wang, C.; Sennrich, R. On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, WA, USA, 5–10 July 2020; pp. 3544–3552. [Google Scholar]
- Raunak, V.; Menezes, A.; Junczys-Dowmunt, M. The Curious Case of Hallucinations in Neural Machine Translation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 1172–1183. [Google Scholar]
- Guerreiro, N.M.; Alves, D.M.; Waldendorf, J.; Haddow, B.; Birch, A.; Colombo, P.; Martins, A.F. Hallucinations in large multilingual translation models. Trans. Assoc. Comput. Linguist. 2023, 11, 1500–1517. [Google Scholar] [CrossRef]
- Craciunescu, O.; Gerding-Salas, C.; Stringer-O’Keeffe, S. Machine translation and computer-assisted translation. Transl. J. 2004, 8. [Google Scholar]
- Guha, J.; Heger, C. Machine translation for global e-commerce on ebay. In Proceedings of the AMTA, Vancouver, BC, Canada, 22–26 October 2014; Volume 2, pp. 31–37. [Google Scholar]
- Dinu, G.; Mathur, P.; Federico, M.; Al-Onaizan, Y. Training Neural Machine Translation to Apply Terminology Constraints. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 3063–3068. [Google Scholar]
- Kocmi, T.; Avramidis, E.; Bawden, R.; Bojar, O.; Dvorkovich, A.; Federmann, C.; Fishel, M.; Freitag, M.; Gowda, T.; Grundkiewicz, R.; et al. Findings of the WMT24 general machine translation shared task: The LLM era is here but MT is not solved yet. In Proceedings of the Ninth Conference on Machine Translation, Miami, FL, USA, 15–16 November 2024; pp. 1–46. [Google Scholar]
- Reeder, F. In one hundred words or less. In Proceedings of the Workshop on MT Evaluation, Santiago de Compostela, Spain, 18–22 September 2001. [Google Scholar]
- Ahmadian, A.; Cremer, C.; Gallé, M.; Fadaee, M.; Kreutzer, J.; Pietquin, O.; Üstün, A.; Hooker, S. Back to Basics: Revisiting rEINFOrCE-Style Optimization for Learning from Human Feedback in LLMs. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 11–16 August 2024; Volume 1: Long Papers, pp. 12248–12267. [Google Scholar]
- Müller, M.; Sennrich, R.; Volk, M. Evaluation of coherence in machine translation with discourse-aware metrics. In Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium, 31 October–1 November 2018; pp. 958–967. [Google Scholar]
- Bawden, R.; Sennrich, R.; Birch, A.; Haddow, B. Evaluating discourse phenomena in neural machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; pp. 1304–1313. [Google Scholar]
- Wang, X.; Zhang, J.; Koehn, P. Document-level translation with large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023; Volume 1: Long Papers, pp. 7274–7290. [Google Scholar]
- Freitag, M.; Al-Onaizan, Y.; Bapna, A.; Johnson, M.; Niu, X.; Rios, A.; Tran, C.; Firat, O. High-quality machine translation with expert-based human evaluation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; pp. 1183–1200. [Google Scholar]
- Iyer, V.; Malik, B.; Stepachev, P.; Chen, P.; Haddow, B.; Birch, A. Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation. In Proceedings of the Ninth Conference on Machine Translation, Miami, FL, USA, 15–16 November 2024; pp. 1393–1409. [Google Scholar]
- Iyer, V.; Chen, P.; Birch, A. Towards Effective Disambiguation for Machine Translation with Large Language Models. In Proceedings of the Eighth Conference on Machine Translation, Singapore, 6–7 December 2023; pp. 482–495. [Google Scholar]
- Oncevay, A.; Ataman, D.; Van Berkel, N.; Haddow, B.; Birch, A.; Bjerva, J. Quantifying Synthesis and Fusion and Their Impact on Machine Translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; pp. 1308–1321. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ataman, D.; Birch, A.; Habash, N.; Federico, M.; Koehn, P.; Cho, K. Machine Translation in the Era of Large Language Models:A Survey of Historical and Emerging Problems. Information 2025, 16, 723. https://doi.org/10.3390/info16090723
Ataman D, Birch A, Habash N, Federico M, Koehn P, Cho K. Machine Translation in the Era of Large Language Models:A Survey of Historical and Emerging Problems. Information. 2025; 16(9):723. https://doi.org/10.3390/info16090723
Chicago/Turabian StyleAtaman, Duygu, Alexandra Birch, Nizar Habash, Marcello Federico, Philipp Koehn, and Kyunghyun Cho. 2025. "Machine Translation in the Era of Large Language Models:A Survey of Historical and Emerging Problems" Information 16, no. 9: 723. https://doi.org/10.3390/info16090723
APA StyleAtaman, D., Birch, A., Habash, N., Federico, M., Koehn, P., & Cho, K. (2025). Machine Translation in the Era of Large Language Models:A Survey of Historical and Emerging Problems. Information, 16(9), 723. https://doi.org/10.3390/info16090723