Next Article in Journal
Investigating the Potential of Gamification to Improve Seniors’ Experience and Use of Technology
Previous Article in Journal
TwiFly: A Data Analysis Framework for Twitter
Open AccessArticle

Research on Uyghur Pattern Matching Based on Syllable Features

1
School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
2
Key Laboratory of Multilingual Information Technology in Xinjiang Uygur Autonomous Region, Urumqi 830046, China
*
Author to whom correspondence should be addressed.
Information 2020, 11(5), 248; https://doi.org/10.3390/info11050248
Received: 27 March 2020 / Revised: 30 April 2020 / Accepted: 1 May 2020 / Published: 2 May 2020
(This article belongs to the Section Information Processes)
Pattern matching is widely used in various fields such as information retrieval, natural language processing (NLP), data mining and network security. In Uyghur (a typical agglutinative, low-resource language with complex morphology, spoken by the ethnic Uyghur group in Xinjiang, China), research on pattern matching is also ongoing. Due to the language characteristics, the pattern matching using characters and words as basic units has insufficient performance. There are two problems for pattern matching: (1) vowel weakening and (2) morphological changes caused by suffixes. In view of the above problems, this paper proposes a Boyer–Moore-U (BM-U) algorithm and a retrievable syllable coding format based on the syllable features of the Uyghur language and the improvement of the Boyer–Moore (BM) algorithm. This algorithm uses syllable features to perform pattern matching, which effectively solves the problem of weakening vowels, and it can better match words with stem shape changes. Finally, in the pattern matching experiments based on character-encoded text and syllable-encoded text for vowel-weakened words, the BM-U algorithm precision, recall, F1-measure and accuracy are improved by 4%, 55%, 33%, 25% and 10%, 52%, 38%, 38% compared to the BM algorithm. View Full-Text
Keywords: pattern matching; text search; Uyghur; syllable; Boyer–Moore; BM-U pattern matching; text search; Uyghur; syllable; Boyer–Moore; BM-U
Show Figures

Figure 1

MDPI and ACS Style

Abliz, W.; Maimaiti, M.; Wu, H.; Wushouer, J.; Abiderexiti, K.; Yibulayin, T.; Wumaier, A. Research on Uyghur Pattern Matching Based on Syllable Features. Information 2020, 11, 248.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop