Text classification is a process of classifying textual contents to a set of predefined classes and categories. As enormous numbers of documents and contextual contents are introduced every day on the Internet, it becomes essential to use text classification techniques for different purposes such as enhancing search retrieval and recommendation systems. A lot of work has been done to study different aspects of English text classification techniques. However, little attention has been devoted to study Arabic text classification due to the difficulty of processing Arabic language. Consequently, in this paper, we propose an enhanced Arabic topic-discovery architecture (EATA) that can use ontology to provide an effective Arabic topic classification mechanism. We have introduced a semantic enhancement model to improve Arabic text classification and the topic discovery technique by utilizing the rich semantic information in Arabic ontology. We rely in this study on the vector space model (term frequency-inverse document frequency (TF-IDF)) as well as the cosine similarity approach to classify new Arabic textual documents.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited