Next Article in Journal
Human Activity Recognition for Production and Logistics—A Systematic Literature Review
Next Article in Special Issue
Crowdsourcing the Paldaruo Speech Corpus of Welsh for Speech Technology
Previous Article in Journal
Task Assignment Algorithm Based on Trust in Volunteer Computing Platforms
Article Menu
Issue 8 (August) cover image

Export Article

Open AccessArticle

Text Filtering through Multi-Pattern Matching: A Case Study of Wu–Manber–Uy on the Language of Uyghur

1
College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
2
Information Retrieval & Knowledge Management Research Lab, York University, Toronto, ON M3J 1P3, Canada
*
Author to whom correspondence should be addressed.
Information 2019, 10(8), 246; https://doi.org/10.3390/info10080246
Received: 14 May 2019 / Revised: 14 June 2019 / Accepted: 19 July 2019 / Published: 24 July 2019
(This article belongs to the Special Issue Computational Linguistics for Low-Resource Languages)
  |  
PDF [4358 KB, uploaded 24 July 2019]
  |  

Abstract

Given its generality in applications and its high time-efficiency on big data-sets, in recent years, the technique of text filtering through pattern matching has been attracting increasing attention from the field of information retrieval and Natural language Processing (NLP) research communities at large. That being the case, however, it has yet to be seen how this technique and its algorithms, (e.g., Wu–Manber, which is also considered in this paper) can be applied and adopted properly and effectively to Uyghur, a low-resource language that is mostly spoken by the ethnic Uyghur group with a population of more than eleven-million in Xinjiang, China. We observe that technically, the challenge is mainly caused by two factors: (1) Vowel weakening and (2) mismatching in semantics between affixes and stems. Accordingly, in this paper, we propose Wu–Manber–Uy, a variant of an improvement to Wu–Manber, dedicated particularly for working on the Uyghur language. Wu–Manber–Uy implements a stem deformation-based pattern expansion strategy, specifically for reducing the mismatching of patterns caused by vowel weakening and spelling errors. A two-way strategy that applies invigilation and control on the change of lexical meaning of stems during word-building is also used in Wu–Manber–Uy. Extra consideration with respect to Word2vec and the dictionary are incorporated into the system for processing Uyghur. The experimental results we have obtained consistently demonstrate the high performance of Wu–Manber–Uy. View Full-Text
Keywords: Uyghur; multi-pattern matching; Wu–Manber; Wu–Manber–Uy; text filtering Uyghur; multi-pattern matching; Wu–Manber; Wu–Manber–Uy; text filtering
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Tohti, T.; Huang, J.; Hamdulla, A.; Tan, X. Text Filtering through Multi-Pattern Matching: A Case Study of Wu–Manber–Uy on the Language of Uyghur. Information 2019, 10, 246.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Information EISSN 2078-2489 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top