Some Properties of Zipf’s Law and Applications
Abstract
:1. Introduction
2. Related Work
3. Zipf’s Law over Finite Multisets—Properties
- (i).
- the objects in the population belong to a discrete, finite multiset of prototypes (individual words or lemmas), with the number of replications of the same type of object (multiplicity) not limited in the population; the set of prototypes is denoted by and the ordinal by ; we assume is the vocabulary of a language, or at least a large part of it;
- (ii).
- the ordinal of the set of prototypes is much smaller than the number of elements in the population (text); .
4. Power Laws with Variable or Noisy Exponents
5. Mixtures of Populations with Power Laws and Etymological Populations
6. Applications, Discussion, and Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
- (1)
- Downloading the data locally
- (2)
- Process the downloaded data
cap, | Nc, | 178, | 1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 |
veni, | Vm, | 163, | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 |
general, | A, | 27, | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 |
mirat, | A, | 4, | 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 |
general, | Nc, | 4, | 0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 |
References
- Gabaix, X. Zipf’s Law for Cities: An Explanation. Q. J. Econ. 1999, 114, 739–767. Available online: http://www.jstor.org/stable/2586883 (accessed on 20 January 2024). [CrossRef]
- Kondo, I.O.; Lewis, L.T.; Stella, A. Heavy tailed but not Zipf: Firm and establishment size in the United States. J. Appl. Econom. 2023, 38, 767–785. [Google Scholar] [CrossRef]
- Fazio, G.; Modica, M. Pareto or log-normal? Best fit and truncation in the distribution of all cities. J. Reg. Sci. 2015, 55, 736–756. [Google Scholar] [CrossRef]
- Gabaix, X. Power laws in economics: An introduction. J. Econ. Perspect. 2016, 30, 185–206. [Google Scholar] [CrossRef]
- Baayen, R.H. Word Frequency Distributions, Chapter 1; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
- Corral, Á.; Boleda, G.; Ferrer-i-Cancho, R. Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts. PLoS ONE 2015, 10, e0129031. [Google Scholar] [CrossRef] [PubMed]
- Ferrer i Cancho, R. The variation of Zipf’s law in human language. Eur. Phys. J. B-Condens. Matter Complex Syst. 2005, 44, 249–257. [Google Scholar] [CrossRef]
- Lin, J.H.; Lee, W.C. Complementary Log Regression for Sufficient-Cause Modeling of Epidemiologic Data. Sci. Rep. 2016, 6, 39023. [Google Scholar] [CrossRef] [PubMed]
- Furusawa, C.; Kaneko, K. Zipf’s law in gene expression. Phys. Rev. Lett. 2003, 90, 088102. [Google Scholar] [CrossRef]
- Zanette, D.H. Zipf’s law and the creation of musical context. Music. Sci. 2006, 10, 3–18. [Google Scholar] [CrossRef]
- Manaris, B.; Purewal, T.; McCormick, C. Progress towards recognizing and classifying beautiful music with computers—MIDI-encoded music and the Zipf-Mandelbrot law. In Proceedings of the IEEE SoutheastCon 2002 (Cat. No.02CH37283), Columbia, SC, USA, 5–7 April 2002; pp. 52–57. [Google Scholar] [CrossRef]
- Sharma, S.; Pendharkar, P.C. On the analysis of power law distribution in software component sizes. J. Softw. Evol. Proc. 2022, 34, e2417. [Google Scholar] [CrossRef]
- Wang, D.; Cheng, H.; Wang, P.; Huang, X.; Jian, G. Zipf’s law in passwords. IEEE Trans. Inf. Forensics Secur. 2017, 12, 2776–2791. [Google Scholar] [CrossRef]
- Corominas-Murtra, B.; Fortuny, J.; Solé, R.V. Emergence of Zipf’s law in the evolution of communication. Phys. Rev. E 2011, 83, 036115. [Google Scholar] [CrossRef] [PubMed]
- Dellandrea, E.; Makris, P.; Vincent, N.; Boiron, M. A medical acoustic signal analysis method based on Zipf law. In Proceedings of the 14th International Conference on Digital Signal Processing, DSP 2002 (Cat. No.02TH8628), Santorini, Greece, 1–3 July 2002; Volume 2, pp. 615–618. [Google Scholar] [CrossRef]
- Vincent, N.; Makris, P.; Brodier, J. Compressed image quality and Zipf law. In Proceedings of the WCC 2000—ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000, Beijing, China, 21–25 August 2000; Volume 2, pp. 1077–1084. [Google Scholar] [CrossRef]
- Adamic, L.A.; Huberman, B.A. Zipf’s law and the Internet. Glottometrics 2002, 3, 143–150. [Google Scholar]
- Fujiwara, Y. Zipf Law in Firms Bankruptcy. Phys. A Stat. Mech. Its Appl. 2004, 337, 219–230. [Google Scholar] [CrossRef]
- Fujiwara, Y.; Di Guilmi, C.; Aoyama, H.; Gallegati, M.; Souma, W. Do Pareto-Zipf and Gibrat laws hold true? An analysis with European Firms. Phys. A Stat. Mech. Its Appl. 2004, 335, 197–216. [Google Scholar] [CrossRef]
- Jiang, B.; Jia, T. Zipf’s law for all the natural cities in the United States: A geospatial perspective. Int. J. Geogr. Inf. Sci. 2011, 25, 1269–1281. [Google Scholar] [CrossRef]
- Teodorescu, M.H.M. Machine Learning Methods for Strategy Research. Report number 18-011. In Harvard Business School Research Paper Series; Harvard Business School: Boston, MA, USA, 2017. [Google Scholar] [CrossRef]
- O’Neale, D.R.J.; Hendy, S.C. Power Law Distributions of Patents as Indicators of Innovation. PLoS ONE 2012, 7, e49501. [Google Scholar] [CrossRef]
- Blackwell, C.; Pan, B.; Li, X.; Smith, W. Power Laws in Tourist Flows; Travel and Tourism Research Association: Advancing Tourism Research Globally: Whitehall, MI, USA, 2011; p. 63. Available online: https://scholarworks.umass.edu/ttra/2011/Oral/63 (accessed on 15 October 2023).
- Torre, I.G.; Luque, B.; Lacasa, L.; Kello, C.T.; Hernández-Fernández, A. On the physical origin of linguistic laws and lognormality in speech. R. Soc. Open Sci. 2019, 6, 191023. [Google Scholar] [CrossRef]
- Teodorescu, H.-N. Big Data and Large Numbers: Interpreting Zipf’s Law. arXiv 2023, arXiv:2305.02687. [Google Scholar] [CrossRef]
- Sanna, C. On the p-adic Valuation of Harmonic Numbers. J. Number Theory 2016, 166, 41–46. [Google Scholar] [CrossRef]
- Milojević, S. Power law distributions in information science: Making the case for logarithmic binning. J. Am. Soc. Inf. Sci. Technol. 2010, 61, 2417–2425. [Google Scholar] [CrossRef]
- Mitzenmacher, M. New Directions for Power Law Research. Radcliffe.ppt. Harvard University. Available online: https://www.eecs.harvard.edu/~michaelm/TALKS/Radcliffe.pdf (accessed on 15 October 2023).
- Mitzenmacher, M. A brief history of generative models for power law and lognormal distributions. Internet Math. 2004, 1, 226–251. Available online: https://dash.harvard.edu/bitstream/handle/1/24828534/tr-08-01.pdf?sequence=1 (accessed on 15 October 2023). [CrossRef]
- Teodorescu, M.H.M.; (Carroll School of Management, Boston College, Boston, MA, USA). Personal communication, 2023.
- Teodorescu, M.H.M.; Choudhury, P.; Khanna, T. Role of context in knowledge flows: Host country versus headquarters as sources of MNC subsidiary knowledge inheritance. Glob. Strategy J. 2022, 12, 658–678. [Google Scholar] [CrossRef]
- Zhang, L.; Dong, W.; Mu, X. Analysing the features of negative sentiment tweets. Electron. Libr. 2018, 36, 782–799. [Google Scholar] [CrossRef]
- Sarna, G.; Bhatia, M.P. Identification of suspicious patterns in social network using Zipf’s law. In Proceedings of the International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India, 12–13 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 957–962. [Google Scholar]
- Thurner, S.; Szell, M.; Sinatra, R. Emergence of good conduct, scaling and Zipf laws in human behavioral sequences in an online world. PLoS ONE 2012, 7, e29796. [Google Scholar] [CrossRef]
- Teodorescu, H.-N.L.; Bolea, S.C. On the algorithmic role of synonyms and keywords in analytics for catastrophic events. In Proceedings of the 8th International Conference on Electronics, Computers and Artificial Intelligence (ECAI 2016), Ploiesti, Romania, 30 June–2 July 2016; pp. 1–6. [Google Scholar] [CrossRef]
- Pirnau, M. Analysis of data volumes circulating in SNs after the occurrence of an earthquake. ROMJIST 2017, 20, 286–298. [Google Scholar]
- Teodorescu, H.N.; Pirnau, M. Twitter’s Mirroring of the 2022 Energy Crisis: What It Teaches Decision-Makers–A Preliminary Study. Rom. J. Inf. Sci. Technol. 2023, 26, 312–322. [Google Scholar] [CrossRef]
- Pirnau, M.; Priescu, I.; Joita, D.; Priescu, C.M. Analysis of the Energy Crisis in the Content of Users’ Posts on Twitter. In Proceedings of the 17th International Conference on Engineering of Modern Electric Systems (EMES), Oradea, Romania, 9–10 June 2023; pp. 1–4. [Google Scholar] [CrossRef]
- Teodorescu, H.-N.; Bolea, C.S. A Comparative Lexical Analysis of Three Romanian Works–The Etymological Metalepsis Role and Etymological Indices. Rom. J. Inf. Sci. Technol. 2022, 25, 275–289. [Google Scholar]
- Beretta, F.; Dimino, J.; Fang, W.; Martinez, T.C.; Miller, S.J.; Stoll, D. On Benford’s Law and the Coefficients of the Riemann Mapping Function for the Exterior of the Mandelbrot Set. Fractal Fract. 2022, 6, 534. [Google Scholar] [CrossRef]
- Dexonline. Available online: https://dexonline.ro/ (accessed on 15 October 2023).
- Dexonline-Scraper. MIT License. Available online: https://github.com/vxern/dexonline-scraper (accessed on 20 January 2024).
- Teodoreanu, I.; La Medeleni, R. Volumul I, Hotarul Nestatornic; Editura “Cartea Românească”: Bucharest, Romania, 1925. [Google Scholar]
- Teodoreanu, I.; La Medeleni, R. Volumul III, Între Vânturi; Editura “Cartea Românească”: Bucharest, Romania, 1927. [Google Scholar]
- Averescu, A. Notițe Zilnice din Războiu (1916–1918); Editura “Cultura Națională București”: Bucharest, Romania, 1935. [Google Scholar]
- Iorga, N. Supt Trei Regi, Istorie a Unei Lupte Pentru un Ideal Moral și Național; Ediția a II-a, București: Bucharest, Romania, 1932. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bolea, S.C.; Pirnau, M.; Bejinariu, S.-I.; Apopei, V.; Gifu, D.; Teodorescu, H.-N. Some Properties of Zipf’s Law and Applications. Axioms 2024, 13, 146. https://doi.org/10.3390/axioms13030146
Bolea SC, Pirnau M, Bejinariu S-I, Apopei V, Gifu D, Teodorescu H-N. Some Properties of Zipf’s Law and Applications. Axioms. 2024; 13(3):146. https://doi.org/10.3390/axioms13030146
Chicago/Turabian StyleBolea, Speranta Cecilia, Mironela Pirnau, Silviu-Ioan Bejinariu, Vasile Apopei, Daniela Gifu, and Horia-Nicolai Teodorescu. 2024. "Some Properties of Zipf’s Law and Applications" Axioms 13, no. 3: 146. https://doi.org/10.3390/axioms13030146
APA StyleBolea, S. C., Pirnau, M., Bejinariu, S. -I., Apopei, V., Gifu, D., & Teodorescu, H. -N. (2024). Some Properties of Zipf’s Law and Applications. Axioms, 13(3), 146. https://doi.org/10.3390/axioms13030146