Literal Pattern Analysis of Texts Written with the Multiple Form of Characters: A Comparative Study of the Human and Machine Styles

Hayata, Kazuya

doi:10.3390/e28010036

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Literal Pattern Analysis of Texts Written with the Multiple Form of Characters: A Comparative Study of the Human and Machine Styles

by

Kazuya Hayata

Sapporo Gakuin University, Ebetsu 069-8555, Japan

Entropy 2026, 28(1), 36; https://doi.org/10.3390/e28010036 (registering DOI)

Submission received: 30 October 2025 / Revised: 22 December 2025 / Accepted: 23 December 2025 / Published: 27 December 2025

(This article belongs to the Special Issue Entropy-Based Time Series Analysis: Theory and Applications)

Download Versions Notes

Abstract

Aside from languages having no form of written expression, it is usually the case with every language on this planet that texts are written in a single character. But every rule has its exceptions. A very rare exception is Japanese, the texts of which are written in the three kinds of characters. In European languages, no one can find a text written in a mixture of the Latin, Cyrillic, and Greek alphabets. For several Japanese texts currently available, we conduct a quantitative analysis of how the three characters are mixed using a methodology based on a binary pattern approach to the sequence that has been generated by a procedure. Specifically, we consider two different texts in the former and present constitutions as well as a famous American story that has been translated at least 13 times into Japanese. For the latter, a comparison is made among the human translations and four machine translations by DeepL and Google Translate. As metrics of divergence and diversity, the Hellinger distance, chi-square value, normalized Shannon entropy, and Simpson’s diversity index are employed. Numerical results suggest that in terms of the entropy, the 17 translations consist of three clusters, and that overall, the machine-translated texts exhibit entropy higher than the human translations. The finding suggests that the present method can provide a tool useful for stylometry and author attribution. Finally, through comparison with the diversity index, capabilities of the entropic measure are confirmed. Lastly, in addition to the abovementioned texts, applicability to the Japanese version of the periodic table of elements is investigated.

Keywords: literal pattern; normalized entropy; Simpson’s diversity index; chi-square test; backtranslation; machine translation; artificial intelligence

Share and Cite

MDPI and ACS Style

Hayata, K. Literal Pattern Analysis of Texts Written with the Multiple Form of Characters: A Comparative Study of the Human and Machine Styles. Entropy 2026, 28, 36. https://doi.org/10.3390/e28010036

AMA Style

Hayata K. Literal Pattern Analysis of Texts Written with the Multiple Form of Characters: A Comparative Study of the Human and Machine Styles. Entropy. 2026; 28(1):36. https://doi.org/10.3390/e28010036

Chicago/Turabian Style

Hayata, Kazuya. 2026. "Literal Pattern Analysis of Texts Written with the Multiple Form of Characters: A Comparative Study of the Human and Machine Styles" Entropy 28, no. 1: 36. https://doi.org/10.3390/e28010036

APA Style

Hayata, K. (2026). Literal Pattern Analysis of Texts Written with the Multiple Form of Characters: A Comparative Study of the Human and Machine Styles. Entropy, 28(1), 36. https://doi.org/10.3390/e28010036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Literal Pattern Analysis of Texts Written with the Multiple Form of Characters: A Comparative Study of the Human and Machine Styles

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI