Next Article in Journal
Allometric Scaling of Mutual Information in Complex Networks: A Conceptual Framework and Empirical Approach
Next Article in Special Issue
The Brevity Law as a Scaling Law, and a Possible Origin of Zipf’s Law for Word Frequencies
Previous Article in Journal
Nonlinear Canonical Correlation Analysis:A Compressed Representation Approach
Previous Article in Special Issue
Criticality in Pareto Optimal Grammars?
Open AccessArticle

Asymptotic Analysis of the kth Subword Complexity

by Lida Ahmadi 1,*,† and Mark Daniel Ward 2
1
Department of Mathematics, Purdue University, West Lafayette 47907, IN, USA
2
Department of Statistics, Purdue University, West Lafayette 47907, IN, USA
*
Author to whom correspondence should be addressed.
Current address: 5500 University Parkway, San Bernardino 92407, CA, USA
Entropy 2020, 22(2), 207; https://doi.org/10.3390/e22020207
Received: 25 December 2019 / Revised: 28 January 2020 / Accepted: 4 February 2020 / Published: 12 February 2020
(This article belongs to the Special Issue Information Theory and Language)
Patterns within strings enable us to extract vital information regarding a string’s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the kth Subword Complexity of the character string. By definition, the kth Subword Complexity is the number of distinct substrings of length k that appear in a given string. In this paper, we evaluate the expected value and the second factorial moment (followed by a corollary on the second moment) of the kth Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns’ auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of k = Θ ( log n ) . In the proof, we compare the distribution of the kth Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie. The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis..
Keywords: subword complexity; asymptotics; generating functions; saddle point method; probability; the Mellin transform; moments. subword complexity; asymptotics; generating functions; saddle point method; probability; the Mellin transform; moments.
MDPI and ACS Style

Ahmadi, L.; Ward, M.D. Asymptotic Analysis of the kth Subword Complexity. Entropy 2020, 22, 207.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop