Previous Article in Journal
Multiscale Permutation Time Irreversibility Analysis of MEG in Patients with Schizophrenia
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World

by
Boris Ryabko
1,2,*,†,
Nadezhda Savina
2,†,
Yeshewas Getachew Lulu
2,† and
Yunfei Han
2,†
1
Federal Research Center for Information and Computational Technologies, 6300090 Novosibirsk, Russia
2
Department of Information Technologies, Novosibirsk State University, 6300090 Novosibirsk, Russia
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2025, 27(10), 1039; https://doi.org/10.3390/e27101039 (registering DOI)
Submission received: 25 August 2025 / Revised: 25 September 2025 / Accepted: 1 October 2025 / Published: 4 October 2025
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

In this paper, we apply an information-theoretic method proposed by Ryabko and Savina (therefore called the RS-method), based on the use of data compression, to recognize the individual author’s style of a writer across four languages from different language groups and families. In this paper, the presented method was used to study fiction texts in Russian (East Slavic group of languages of the Indo-European language family), Amharic (South Ethiosemitic group of the Semitic language family), Chinese (Sinitic group of the Sino-Tibetan language family) and English (West Germanic language group of the Indo-European language family). It was found that the amount of data necessary for recognizing an author’s style is almost the same for all four languages, i.e., the amount of data is invariant across different language groups. The results obtained are of interest to computer science, literary studies, linguistics and, in particular, computational linguistics.
Keywords: information technology; data compression; language family; language group; individual author’s style of the writer; information-theoretic method (RS-method); hypothesis testing information technology; data compression; language family; language group; individual author’s style of the writer; information-theoretic method (RS-method); hypothesis testing

Share and Cite

MDPI and ACS Style

Ryabko, B.; Savina, N.; Lulu, Y.G.; Han, Y. The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World. Entropy 2025, 27, 1039. https://doi.org/10.3390/e27101039

AMA Style

Ryabko B, Savina N, Lulu YG, Han Y. The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World. Entropy. 2025; 27(10):1039. https://doi.org/10.3390/e27101039

Chicago/Turabian Style

Ryabko, Boris, Nadezhda Savina, Yeshewas Getachew Lulu, and Yunfei Han. 2025. "The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World" Entropy 27, no. 10: 1039. https://doi.org/10.3390/e27101039

APA Style

Ryabko, B., Savina, N., Lulu, Y. G., & Han, Y. (2025). The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World. Entropy, 27(10), 1039. https://doi.org/10.3390/e27101039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop