Next Article in Journal
Standardized Test Procedure for External Human–Machine Interfaces of Automated Vehicles
Previous Article in Journal
Customer Loyalty Improves the Effectiveness of Recommender Systems Based on Complex Network
Open AccessArticle

A Syllable-Based Technique for Uyghur Text Compression

1
School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
2
Xinjiang Laboratory of Multi-Language Information Technology, Xinjiang University, Urumqi 830046, China
*
Author to whom correspondence should be addressed.
Information 2020, 11(3), 172; https://doi.org/10.3390/info11030172
Received: 23 February 2020 / Revised: 18 March 2020 / Accepted: 18 March 2020 / Published: 23 March 2020
(This article belongs to the Section Information Processes)
To improve utilization of text storage resources and efficiency of data transmission, we proposed two syllable-based Uyghur text compression coding schemes. First, according to the statistics of syllable coverage of the corpus text, we constructed a 12-bit and 16-bit syllable code tables and added commonly used symbols—such as punctuation marks and ASCII characters—to the code tables. To enable the coding scheme to process Uyghur texts mixed with other language symbols, we introduced a flag code in the compression process to distinguish the Unicode encodings that were not in the code table. The experiments showed that the 12-bit coding scheme had an average compression ratio of 0.3 on Uyghur text less than 4 KB in size and that the 16-bit coding scheme had an average compression ratio of 0.5 on text less than 2 KB in size. Our compression schemes outperformed GZip, BZip2, and the LZW algorithm on short text and could be effectively applied to the compression of Uyghur short text for storage and applications. View Full-Text
Keywords: text compression; Uyghur; syllable; code table text compression; Uyghur; syllable; code table
Show Figures

Figure 1

MDPI and ACS Style

Abliz, W.; Wu, H.; Maimaiti, M.; Wushouer, J.; Abiderexiti, K.; Yibulayin, T.; Wumaier, A. A Syllable-Based Technique for Uyghur Text Compression. Information 2020, 11, 172.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop