Open AccessThis article is
- freely available
Fitting Ranked Linguistic Data with Two-Parameter Functions
Feinstein Institute for Medical Research, North Shore LIJ Health Systems, 350 Community Drive, Manhasset, NY 11030, USA
Departamento de Matemáticas, Facultad de Ciencias, Universidad Nacional Autónoma de México, Circuito Exterior, Ciudad Universitaria, México 04510 DF, Mexico
Departamento de Sistemas Complejos, Instituto de Física, Universidad Nacional Autónoma de México, Apartado Postal 20-364, México 01000 DF, Mexico
Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Circuito Escolar, Ciudad Universitaria, México 04510 DF, Mexico
* Author to whom correspondence should be addressed.
Received: 4 April 2010; in revised form: 18 May 2010 / Accepted: 1 July 2010 / Published: 7 July 2010
Abstract: It is well known that many ranked linguistic data can fit well with one-parameter models such as Zipf’s law for ranked word frequencies. However, in cases where discrepancies from the one-parameter model occur (these will come at the two extremes of the rank), it is natural to use one more parameter in the fitting model. In this paper, we compare several two-parameter models, including Beta function, Yule function, Weibull function—all can be framed as a multiple regression in the logarithmic scale—in their fitting performance of several ranked linguistic data, such as letter frequencies, word-spacings, and word frequencies. We observed that Beta function fits the ranked letter frequency the best, Yule function fits the ranked word-spacing distribution the best, and Altmann, Beta, Yule functions all slightly outperform the Zipf’s power-law function in word ranked- frequency distribution.
Keywords: Zipf’s law; regression; model selection; Beta function; letter frequency distribution; word-spacing distribution; word frequency distribution; weighting
Citations to this Article
Cite This Article
MDPI and ACS Style
Li, W.; Miramontes, P.; Cocho, G. Fitting Ranked Linguistic Data with Two-Parameter Functions. Entropy 2010, 12, 1743-1764.
Li W, Miramontes P, Cocho G. Fitting Ranked Linguistic Data with Two-Parameter Functions. Entropy. 2010; 12(7):1743-1764.
Li, Wentian; Miramontes, Pedro; Cocho, Germinal. 2010. "Fitting Ranked Linguistic Data with Two-Parameter Functions." Entropy 12, no. 7: 1743-1764.