Next Article in Journal
Statistical Analysis of the Membership Management Indicators of the Church of England UK Dioceses during the Recent (XXth Century) “Decade of Evangelism”
Previous Article in Journal
Portfolio Management of Copula-Dependent Assets Based on P(Y < X) Reliability Models: Revisiting Frank Copula and Dagum Distributions
Previous Article in Special Issue
Benford’s Law for Telemetry Data of Wildlife
Review

Stylometry and Numerals Usage: Benford’s Law and Beyond

1
Department of Modelling of Controllable Systems, Ural Federal University, 620002 Ekaterinburg, Russia
2
Department of Information Technologies and Statistics, Ural State University of Economics, 620144 Ekaterinburg, Russia
Academic Editor: Claudio Lupi
Stats 2021, 4(4), 1051-1068; https://doi.org/10.3390/stats4040060
Received: 29 October 2021 / Revised: 8 December 2021 / Accepted: 9 December 2021 / Published: 14 December 2021
(This article belongs to the Special Issue Benford's Law(s) and Applications)
We suggest two approaches to the statistical analysis of texts, both based on the study of numerals occurrence in literary texts. The first approach is related to Benford’s Law and the analysis of the frequency distribution of various leading digits of numerals contained in the text. In coherent literary texts, the share of the leading digit 1 is even larger than prescribed by Benford’s Law and can reach 50 percent. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic the author’s style feature, manifested in all (sufficiently long) literary texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced stylometric analysis. The proposed approaches are illustrated by examples of computer analysis of the literary texts in English and Russian. View Full-Text
Keywords: Benford’s Law; first significant digit; leading digit; numerals in texts; quantitative linguistics; stylometry; attribution of texts; text authorship Benford’s Law; first significant digit; leading digit; numerals in texts; quantitative linguistics; stylometry; attribution of texts; text authorship
Show Figures

Figure 1

MDPI and ACS Style

Zenkov, A.V. Stylometry and Numerals Usage: Benford’s Law and Beyond. Stats 2021, 4, 1051-1068. https://doi.org/10.3390/stats4040060

AMA Style

Zenkov AV. Stylometry and Numerals Usage: Benford’s Law and Beyond. Stats. 2021; 4(4):1051-1068. https://doi.org/10.3390/stats4040060

Chicago/Turabian Style

Zenkov, Andrei V. 2021. "Stylometry and Numerals Usage: Benford’s Law and Beyond" Stats 4, no. 4: 1051-1068. https://doi.org/10.3390/stats4040060

Find Other Styles

Article Access Map by Country/Region

1
Back to TopTop