Next Article in Journal
A New Chaotic System with a Self-Excited Attractor: Entropy Measurement, Signal Encryption, and Parameter Estimation
Next Article in Special Issue
A Simple and Adaptive Dispersion Regression Model for Count Data
Previous Article in Journal
Residual Entropy and Critical Behavior of Two Interacting Boson Species in a Double Well
Article Menu
Issue 2 (February) cover image

Export Article

Open AccessArticle
Entropy 2018, 20(2), 85; https://doi.org/10.3390/e20020085

Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warszawa, Poland
Received: 4 January 2018 / Revised: 23 January 2018 / Accepted: 24 January 2018 / Published: 26 January 2018
(This article belongs to the Special Issue Power Law Behaviour in Complex Systems)
Full-Text   |   PDF [341 KB, uploaded 29 January 2018]   |  

Abstract

As we discuss, a stationary stochastic process is nonergodic when a random persistent topic can be detected in the infinite random text sampled from the process, whereas we call the process strongly nonergodic when an infinite sequence of independent random bits, called probabilistic facts, is needed to describe this topic completely. Replacing probabilistic facts with an algorithmically random sequence of bits, called algorithmic facts, we adapt this property back to ergodic processes. Subsequently, we call a process perigraphic if the number of algorithmic facts which can be inferred from a finite text sampled from the process grows like a power of the text length. We present a simple example of such a process. Moreover, we demonstrate an assertion which we call the theorem about facts and words. This proposition states that the number of probabilistic or algorithmic facts which can be inferred from a text drawn from a process must be roughly smaller than the number of distinct word-like strings detected in this text by means of the Prediction by Partial Matching (PPM) compression algorithm. We also observe that the number of the word-like strings for a sample of plays by Shakespeare follows an empirical stepwise power law, in a stark contrast to Markov processes. Hence, we suppose that natural language considered as a process is not only non-Markov but also perigraphic. View Full-Text
Keywords: stationary processes; PPM code; mutual information; power laws; algorithmic information theory; natural language stationary processes; PPM code; mutual information; power laws; algorithmic information theory; natural language
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Dębowski, Ł. Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited. Entropy 2018, 20, 85.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top