Next Article in Journal
Spatiotemporal Dynamics and Obstacles of the Multi-Functionality of Land Use in Xiangxi, China
Previous Article in Journal
Development of a Convolution-Based Multi-Directional and Parallel Ant Colony Algorithm Considering a Network with Dynamic Topology Changes
Article

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

1
School of Information and Software Engineering, University of Electronic Science and Technology of China, Xiyuan Ave, West Hi-Tech Zone, Chengdu 611731, China
2
School of Information and Communication Technology, South Eastern Kenya University, Kitui 170-90200, Kenya
*
Author to whom correspondence should be addressed.
Appl. Sci. 2019, 9(18), 3648; https://doi.org/10.3390/app9183648
Received: 22 July 2019 / Revised: 28 August 2019 / Accepted: 29 August 2019 / Published: 4 September 2019
(This article belongs to the Section Computing and Artificial Intelligence)
Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili. View Full-Text
Keywords: syllabic alphabet; word representation vectors; deep learning; syllable-aware language model; perplexity; word analogy syllabic alphabet; word representation vectors; deep learning; syllable-aware language model; perplexity; word analogy
Show Figures

Figure 1

MDPI and ACS Style

Shikali, C.S.; Sijie, Z.; Qihe, L.; Mokhosi, R. Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili. Appl. Sci. 2019, 9, 3648. https://doi.org/10.3390/app9183648

AMA Style

Shikali CS, Sijie Z, Qihe L, Mokhosi R. Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili. Applied Sciences. 2019; 9(18):3648. https://doi.org/10.3390/app9183648

Chicago/Turabian Style

Shikali, Casper S., Zhou Sijie, Liu Qihe, and Refuoe Mokhosi. 2019. "Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili" Applied Sciences 9, no. 18: 3648. https://doi.org/10.3390/app9183648

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop