Next Article in Journal
Big-Crypto: Big Data, Blockchain and Cryptocurrency
Previous Article in Journal
Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion
Previous Article in Special Issue
Edge Machine Learning: Enabling Smart Internet of Things Applications
Article Menu

Export Article

Open AccessArticle
Big Data Cogn. Comput. 2018, 2(4), 33; https://doi.org/10.3390/bdcc2040033

Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining

Department of Computer Science, UNC Chatlotte, Charlotte, NC 28223, USA
Current address: Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA.
*
Author to whom correspondence should be addressed.
Received: 23 September 2018 / Revised: 8 October 2018 / Accepted: 15 October 2018 / Published: 18 October 2018
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2018)
Full-Text   |   PDF [887 KB, uploaded 18 October 2018]   |  

Abstract

Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided, the topology of different entities through a textual document may reveal some additive information regarding the document that is not reflected in any other features from conventional text processing methods. In this paper, we introduce a novel approach that hires TDA in text processing in order to capture and use the topology of different same-type entities in textual documents. First, we will show how to extract some topological signatures in the text using persistent homology-i.e., a TDA tool that captures topological signature of data cloud. Then we will show how to utilize these signatures for text classification. View Full-Text
Keywords: topological data analysis; text mining; computational topology; style; persistent homology topological data analysis; text mining; computational topology; style; persistent homology
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Gholizadeh, S.; Seyeditabari, A.; Zadrozny, W. Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining. Big Data Cogn. Comput. 2018, 2, 33.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Big Data Cogn. Comput. EISSN 2504-2289 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top