This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World
by
Boris Ryabko
Boris Ryabko 1,2,*,†
,
Nadezhda Savina
Nadezhda Savina 2,†
,
Yeshewas Getachew Lulu
Yeshewas Getachew Lulu 2,†
and
Yunfei Han
Yunfei Han 2,†
1
Federal Research Center for Information and Computational Technologies, 6300090 Novosibirsk, Russia
2
Department of Information Technologies, Novosibirsk State University, 6300090 Novosibirsk, Russia
*
Author to whom correspondence should be addressed.
†
These authors contributed equally to this work.
Entropy 2025, 27(10), 1039; https://doi.org/10.3390/e27101039 (registering DOI)
Submission received: 25 August 2025
/
Revised: 25 September 2025
/
Accepted: 1 October 2025
/
Published: 4 October 2025
Abstract
In this paper, we apply an information-theoretic method proposed by Ryabko and Savina (therefore called the RS-method), based on the use of data compression, to recognize the individual author’s style of a writer across four languages from different language groups and families. In this paper, the presented method was used to study fiction texts in Russian (East Slavic group of languages of the Indo-European language family), Amharic (South Ethiosemitic group of the Semitic language family), Chinese (Sinitic group of the Sino-Tibetan language family) and English (West Germanic language group of the Indo-European language family). It was found that the amount of data necessary for recognizing an author’s style is almost the same for all four languages, i.e., the amount of data is invariant across different language groups. The results obtained are of interest to computer science, literary studies, linguistics and, in particular, computational linguistics.
Share and Cite
MDPI and ACS Style
Ryabko, B.; Savina, N.; Lulu, Y.G.; Han, Y.
The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World. Entropy 2025, 27, 1039.
https://doi.org/10.3390/e27101039
AMA Style
Ryabko B, Savina N, Lulu YG, Han Y.
The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World. Entropy. 2025; 27(10):1039.
https://doi.org/10.3390/e27101039
Chicago/Turabian Style
Ryabko, Boris, Nadezhda Savina, Yeshewas Getachew Lulu, and Yunfei Han.
2025. "The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World" Entropy 27, no. 10: 1039.
https://doi.org/10.3390/e27101039
APA Style
Ryabko, B., Savina, N., Lulu, Y. G., & Han, Y.
(2025). The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World. Entropy, 27(10), 1039.
https://doi.org/10.3390/e27101039
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.