# Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Background

#### 2.1. Fundamental Definitions

#### 2.2. Related Work

## 3. Methodology

## 4. Results and Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

TDA | Topological Data Analysis |

NER | Named Entity Recognizer |

SIFTS | Similarity Filtration with Time Skeleton |

k-NN | k-Nearest Neighbors |

## References

Sample Availability: All books used in this study were retrieved from project Gutenberg and are in US public domain. All the codes for this study are available through this link: https://github.com/shervin821/Novels_TDA.git. |

**Figure 1.**0-simplex, 1-simplex, 2-simplex and 3-simplex (left). An example of simplicial complex (right).

**Figure 2.**Betti numbers for a single point, a circle, sphere and a torus. In a k-dimensional space, nth Betti number is always zero for any $n\ge k$.

**Figure 3.**A simple data cloud (

**left**) with its persistence diagram at dimension one the illustrates the birth and the death of loops (

**middle**) and equivalent representation of barcode (

**right**).

**Table 1.**Average Accuracy of binary classification, having a labeled set of novels by two novelists and using 10-fold cross validation. The numbers in parentheses are the total number of novels for each novelist. The accuracy values are in percentages.

Charles Dickens | Émile Zola | Fyodor Dostoyevsky | Jane Austen | Mark Twain | Walter Scott | |
---|---|---|---|---|---|---|

(17) | (18) | (8) | (6) | (8) | (18) | |

C. Dickens | - | 87.0 | 72.2 | 100.0 | 74.6 | 73.9 |

É. Zola | 87.0 | - | 65.0 | 64.2 | 68.8 | 83.3 |

F. Dostoyevsky | 72.2 | 65.0 | - | 90.2 | 73.3 | 55.8 |

J. Austen | 100.0 | 64.2 | 90.2 | - | 82.9 | 94.7 |

M. Twain | 74.6 | 68.8 | 73.3 | 82.9 | - | 68.5 |

W. Scott | 73.9 | 83.3 | 55.8 | 94.7 | 68.5 | - |

Average | 81.5 | 73.7 | 71.3 | 86.4 | 73.6 | 75.2 |

