Benefitting from the rapid development of artificial intelligence (AI) and deep learning, the machine translation task based on neural networks has achieved impressive performance in many high-resource language pairs. However, the neural machine translation (NMT) models still struggle in the translation task on agglutinative languages with complex morphology and limited resources. Inspired by the finding that utilizing the source-side linguistic knowledge can further improve the NMT performance, we propose a multi-source neural model that employs two separate encoders to encode the source word sequence and the linguistic feature sequences. Compared with the standard NMT model, we utilize an additional encoder to incorporate the linguistic features of lemma, part-of-speech (POS) tag, and morphological tag by extending the input embedding layer of the encoder. Moreover, we use a serial combination method to integrate the conditional information from the encoders with the outputs of the decoder, which aims to enhance the neural model to learn a high-quality context representation of the source sentence. Experimental results show that our approach is effective for the agglutinative language translation, which achieves the highest improvements of +2.4 BLEU points on Turkish–English translation task and +0.6 BLEU points on Uyghur–Chinese translation task.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited