Next Article in Journal
Early Diagnosis of Carotid Stenosis by Ultrasound Doppler Investigations: A Classification Method for the Hemodynamic Parameter
Previous Article in Journal
Data-Driven Critical Tract Variable Determination for European Portuguese
Article

A Neural-Network-Based Approach to Chinese–Uyghur Organization Name Translation

1
School of Information Science and Engineering, Xinjiang University, Urumqi 830046, Xinjiang, China
2
Key Laboratory of Multilingual Information Technology in Xinjiang Uyghur Autonomous Region, Urumqi 830046, Xinjiang, China
3
School of Software, Xinjiang University, Urumqi 830091, Xinjiang, China
4
Urumqi Campus, Engineering University of PAP, Urumqi 830049, Xinjiang, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Information 2020, 11(10), 492; https://doi.org/10.3390/info11100492
Received: 21 September 2020 / Revised: 16 October 2020 / Accepted: 19 October 2020 / Published: 21 October 2020
(This article belongs to the Section Artificial Intelligence)
The recognition and translation of organization names (ONs) is challenging due to the complex structures and high variability involved. ONs consist not only of common generic words but also names, rare words, abbreviations and business and industry jargon. ONs are a sub-class of named entity (NE) phrases, which convey key information in text. As such, the correct translation of ONs is critical for machine translation and cross-lingual information retrieval. The existing Chinese–Uyghur neural machine translation systems have performed poorly when applied to ON translation tasks. As there are no publicly available Chinese–Uyghur ON translation corpora, an ON translation corpus is developed here, which includes 191,641 ON translation pairs. A word segmentation approach involving characterization, tagged characterization, byte pair encoding (BPE) and syllabification is proposed here for ON translation tasks. A recurrent neural network (RNN) attention framework and transformer are adapted here for ON translation tasks with different sequence granularities. The experimental results indicate that the transformer model not only outperforms the RNN attention model but also benefits from the proposed word segmentation approach. In addition, a Chinese–Uyghur ON translation system is developed here to automatically generate new translation pairs. This work significantly improves Chinese–Uyghur ON translation and can be applied to improve Chinese–Uyghur machine translation and cross-lingual information retrieval. It can also easily be extended to other agglutinative languages. View Full-Text
Keywords: named entity translation; organization name translation; word segmentation; tagged characterization; syllabification; transformer named entity translation; organization name translation; word segmentation; tagged characterization; syllabification; transformer
Show Figures

Figure 1

MDPI and ACS Style

Wumaier, A.; Xu, C.; Kadeer, Z.; Liu, W.; Wang, Y.; Haierla, X.; Maimaiti, M.; Tian, S.; Saimaiti, A. A Neural-Network-Based Approach to Chinese–Uyghur Organization Name Translation. Information 2020, 11, 492. https://doi.org/10.3390/info11100492

AMA Style

Wumaier A, Xu C, Kadeer Z, Liu W, Wang Y, Haierla X, Maimaiti M, Tian S, Saimaiti A. A Neural-Network-Based Approach to Chinese–Uyghur Organization Name Translation. Information. 2020; 11(10):492. https://doi.org/10.3390/info11100492

Chicago/Turabian Style

Wumaier, Aishan, Cuiyun Xu, Zaokere Kadeer, Wenqi Liu, Yingbo Wang, Xireaili Haierla, Maihemuti Maimaiti, ShengWei Tian, and Alimu Saimaiti. 2020. "A Neural-Network-Based Approach to Chinese–Uyghur Organization Name Translation" Information 11, no. 10: 492. https://doi.org/10.3390/info11100492

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop