Assessing the Role of Socio-Demographic Triggers on Kolmogorov-Based Complexity in Spoken English Varieties
Abstract
1. Introduction
2. Materials and Methods
2.1. Corpora and Socio-Demographic Data
2.2. Kolmogorov Complexity as a Measure of Language Complexity
2.3. Statistical Methods
3. Results
3.1. Kolmogorov-Based Complexity of English Varieties
3.2. Kolmogorov-Based Complexity and Socio-Demographic Triggers
4. Discussion
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Variety | Abbreviation |
---|---|
Australian English | AusE |
British English | BrE |
Canadian English | CanE |
Colloquial American English | CollAmE |
Colloquial Singapore English | CollSgE |
English dialects in the Midlands | Mid |
English dialects in the North of England | North |
English dialects in the Southeast of England | SE |
English dialects in the Southwest of England | SW |
Ghanaian English | GhE |
Hebridean English | HebE |
Indian English | IndE |
Irish English | IrE |
Jamaican English | JamE |
Kenyan English | KenE |
Manx English | ManxE |
New Zealand English | NZE |
Nigerian English | NigE |
Philippine English | PhilE |
Scottish English | ScE |
Sri Lankan English | SLkE |
Trinidadian English | TTE |
Ugandan English | UgE |
Welsh English | WelE |
References
- Mufwene, S.; Coupé, C.; Pellegrino, F. Complexity in Language: Developmental and Evolutionary Perspectives; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2017. [Google Scholar]
- Baechler, R.; Seiler, G. (Eds.) Complexity, Isolation, and Variation; De Gruyter: Berlin, Germany, 2016. [Google Scholar]
- Baerman, M.; Brown, D.; Corbett, G.G. (Eds.) Understanding and Measuring Morphological Complexity; Oxford University Press: New York, NY, USA, 2015. [Google Scholar]
- Kortmann, B.; Szmrecsanyi, B. (Eds.) Linguistic Complexity: Second Language Acquisition, Indigenization, Contact; Lingua & Litterae, Walter de Gruyter: Berlin, Germany, 2012. [Google Scholar]
- McWhorter, J. The world’s simplest grammars are creole grammars. Linguist. Typology 2001, 6, 125–166. [Google Scholar]
- Miestamo, M. Grammatical complexity in a cross-linguistic perspective. In Language Complexity: Typology, Contact, Change; Miestamo, M., Sinnemäki, K., Karlsson, F., Eds.; John Benjamins: Amsterdam, The Netherlands; Philadelphia, PA, USA, 2008; pp. 23–41. [Google Scholar]
- Ehret, K.; Berdicevskis, A.; Bentz, C.; Blumenthal-Dramé, A. Measuring language complexity: Challenges and opportunities. Linguist. Vanguard 2023, 9, 1–8. [Google Scholar] [CrossRef]
- Newmeyer, F.J.; Preston, L.B. (Eds.) Measuring Grammatical Complexity; Oxford University Press: New York, NY, USA, 2014. [Google Scholar]
- Ehret, K.; Blumenthal-Dramé, A.; Bentz, C.; Berdicevskis, A. Meaning and measures: Interpreting and evaluating complexity metrics. Front. Commun. 2021, 6, 640510. [Google Scholar] [CrossRef]
- Ehret, K. An information-theoretic view on language complexity and register variation: Compressing naturalistic corpus data. Corpus Linguist. Linguist. Theory 2021, 17, 383–410. [Google Scholar] [CrossRef]
- Audring, J. Calibrating complexity: How complex is a gender system? Lang. Sci. 2017, 60, 53–68. [Google Scholar] [CrossRef]
- Koplenig, A. Language structure is influenced by the number of speakers but seemingly not by the proportion of non-native speakers. R. Soc. Open Sci. 2019, 6, 181274. [Google Scholar] [CrossRef]
- Di Garbo, F.; Verkerk, A. A typology of northwestern Bantu gender systems. Linguistics 2022, 60, 1169–1239. [Google Scholar] [CrossRef]
- Sinnemäki, K.; Di Garbo, F. Language Structures May Adapt to the Sociolinguistic Environment, but It Matters What and How You Count: A Typological Study of Verbal and Nominal Complexity. Front. Psychol. 2018, 9, 1141. [Google Scholar] [CrossRef]
- Bentz, C.; Winter, B. Languages with More Second Language Learners Tend to Lose Nominal Case. Lang. Dyn. Change 2013, 3, 1–27. [Google Scholar] [CrossRef]
- Lupyan, G.; Dale, R. Language Structure Is Partly Determined by Social Structure. PLoS ONE 2010, 5, e8559. [Google Scholar] [CrossRef]
- Trudgill, P. Sociolinguistic Typology: Social Determinants of Linguistic Complexity; Oxford University Press: Oxford, UK; New York, NY, USA, 2011. [Google Scholar]
- Wray, A.; Grace, G.W. The consequences of talking to strangers: Evolutionary corollaries of socio-cultural influences on linguistic form. Lingua 2007, 117, 543–578. [Google Scholar] [CrossRef]
- Kauhanen, H.; Walkden, G.; Einhaus, S. Language structure is influenced by the proportion of non-native speakers: A reply to Koplenig (2019). J. Lang. Evol. 2023, 8, 90–101. [Google Scholar] [CrossRef]
- Koplenig, A. Still No Evidence for an Effect of the Proportion of Non-Native Speakers on Natural Language Complexity. Entropy 2024, 26, 993. [Google Scholar] [CrossRef] [PubMed]
- Shcherbakova, O.; Michaelis, S.M.; Haynie, H.J.; Passmore, S.; Gast, V.; Gray, R.D.; Greenhill, S.J.; Blasi, D.E.; Skirgård, H. Societies of strangers do not speak less complex languages. Sci. Adv. 2023, 9, eadf7704. [Google Scholar] [CrossRef] [PubMed]
- Ehret, K.; Szmrecsanyi, B. An information-theoretic approach to assess linguistic complexity. In Complexity, Isolation, and Variation; Baechler, R., Seiler, G., Eds.; Walter de Gruyter: Berlin, Germany; Boston, MA, USA, 2016; pp. 71–94. [Google Scholar]
- Kortmann, B.; Szmrecsanyi, B. World Englishes between simplification and complexification. In World Englishes-Problems, Properties and Prospects: Selected Papers from the 13th IAWE Conference; Siebers, L., Hoffmann, T., Eds.; John Benjamins: Amsterdam, The Netherlands, 2009; pp. 265–285. [Google Scholar]
- Szmrecsanyi, B. Typological parameters of intralingual variability: Grammatical analyticity versus syntheticity in varieties of English. Lang. Var. Change 2009, 21, 319–353. [Google Scholar] [CrossRef]
- Szmrecsanyi, B.; Kortmann, B. Between simplification and complexification: Non-standard varieties of English around the world. In Language Complexity as an Evolving Variable; Sampson, G., Gil, D., Trudgill, P., Eds.; Oxford University Press: New York, NY, USA, 2009; pp. 64–79. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2025. [Google Scholar]
- Greenbaum, S. ICE: The international corpus of English. Engl. Today 1991, 7, 3–7. [Google Scholar] [CrossRef]
- Kortmann, B.; Wagner, S. The Freiburg English Dialect Project and Corpus (FRED). In A Comparative Grammar of British English Dialects; Closs Traugott, E., Kortmann, B., Kortmann, B., Herrmann, T., Pietsch, L., Wagner, S., Eds.; Number 50.1 in Topics in Linguistics; Mouton de Gruyter: Berlin, Germany, 2008. [Google Scholar]
- Du Bois, J.W.; Chafe, W.L.; Meyer, C.; Thompson, S.A.; Martey, N. Santa Barbara Corpus of Spoken American English, Parts 1–4; Linguistic Data Consortium: Philadelphia, PA, USA, 2000–2005. [Google Scholar]
- Kortmann, B.; Lunkenheimer, K.; Ehret, K. (Eds.) The Electronic World Atlas of Varieties of English; Zenodo: Geneva, Switzerland, 2020. [Google Scholar]
- Comrie, B. Linguistic typology. Annu. Rev. Anthropol. 1988, 17, 145–159. [Google Scholar] [CrossRef]
- Ehret, K. (Ed.) Morphosyntactic-Variation-in-Englishes/DOVE: DOVE v1.0 (v1.0); Zenodo: Geneva, Switzerland, 2025. [Google Scholar]
- Cheng, L.S.P.; Burgess, D.; Vernooij, N.; Solís-Barroso, C.; McDermott, A.; Namboodiripad, S. The Problematic Concept of Native Speaker in Psycholinguistics: Replacing Vague and Harmful Terminology With Inclusive and Accurate Measures. Front. Psychol. 2021, 12, 715843. [Google Scholar] [CrossRef]
- Ehret, K. How to obtain speaker numbers for English varieties around the world: Theoretical concepts, challenges and estimations. Engl. World-Wide 2025. [Google Scholar] [CrossRef]
- Berdicevskis, A.; Semenuks, A. Different trajectories of morphological overspecification and irregularity under imperfect language learning. In The Complexities of Morphology; Arkadiev, P., Gardani, F., Eds.; Oxford University Press: Oxford, UK, 2020; pp. 283–305. [Google Scholar]
- Bentz, C.; Berdicevskis, A. Learning pressures reduce morphological complexity: Linking corpus, computational and experimental evidence. In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016), Osaka, Japan, 11–16 December 2016; pp. 222–232. [Google Scholar]
- Atkinson, M.; Smith, K.; Kirby, S. Adult Learning and Language Simplification. Cogn. Sci. 2018, 42, 2818–2854. [Google Scholar] [CrossRef]
- Chen, S.; Gil, D.; Gaponov, S.; Reifegerste, J.; Yuditha, T.; Tatarinova, T.; Progovac, L.; Benítez-Burraco, A. Linguistic correlates of societal variation: A quantitative analysis. PLoS ONE 2024, 19, e0300838. [Google Scholar] [CrossRef]
- Thomason, S.G.; Kaufman, T. Language Contact, Creolization, and Genetic Linguistics; University of California Press: Berkeley, CA, USA, 1991. [Google Scholar]
- Lunkenheimer, K. Typological profile: L2 varieties. In The Mouton World Atlas of Variation in English; Mouton de Gruyter: Berlin, Germany, 2012; pp. 844–873. [Google Scholar]
- Kortmann, B.; Wolk, C. Morphosyntactic variation in the anglophone world: A global perspective. In The Mouton World Atlas of Variation in English; Mouton de Gruyter: Berlin, Germany, 2012; pp. 906–936. [Google Scholar]
- Szmrecsanyi, B. Typological profile: L1 varieties. In The Mouton World Atlas of Variation in English; Mouton de Gruyter: Berlin, Germany, 2012; pp. 826–843. [Google Scholar]
- Kolmogorov, A.N. Three Approaches to the Quantitative Definition of Information. Probl. Peredachi Informatsii 1965, 1, 3–11. [Google Scholar] [CrossRef]
- Kolmogorov, A. On Tables of Random Numbers. Sankhya 1963, 25, 369–375. [Google Scholar] [CrossRef]
- Juola, P. Measuring linguistic complexity: The morphological tier. J. Quant. Linguist. 1998, 5, 206–213. [Google Scholar] [CrossRef]
- Juola, P. Assessing linguistic complexity. In Language Complexity: Typology, Contact, Change; Miestamo, M., Sinnemäki, K., Karlsson, F., Eds.; John Benjamins: Amsterdam, The Netherlands; Philadelphia, PA, USA, 2008; pp. 89–107. [Google Scholar]
- Li, M.; Vitányi, P.M.B. An Introduction to Kolmogorov Complexity and Its Applications; Springer: New York, NY, USA, 1997. [Google Scholar]
- Li, M.; Chen, X.; Li, X.; Ma, B.; Vitányi, P.M.B. The similarity metric. IEEE Trans. Inf. Theory 2004, 50, 3250–3264. [Google Scholar] [CrossRef]
- Ziv, J.; Lempel, A. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 1977, 23, 337–343. [Google Scholar] [CrossRef]
- Ehret, K. An Information-Theoretic Approach to Language Complexity: Variation in Naturalistic Corpora. Ph.D. Thesis, University of Freiburg, Freiburg, Germany, 2017. [Google Scholar]
- Bates, D.; Mächler, M.; Bolker, B.; Walker, S. Fitting Linear Mixed-Effects Models Using lme4. J. Stat. Softw. 2015, 67, 1–48. [Google Scholar] [CrossRef]
- Barr, D.J.; Levy, R.; Scheepers, C.; Tily, H.J. Random effects structure for confirmatory hypothesis testing: Keep it maximal. J. Mem. Lang. 2013, 68, 255–278. [Google Scholar] [CrossRef]
- Greenbaum, S. Standard English and the international corpus of English. World Englishes 1990, 9, 79–83. [Google Scholar] [CrossRef]
- Guzmán Naranjo, M.; Becker, L. Statistical bias control in typology. Linguist. Typology 2022, 26, 605–670. [Google Scholar] [CrossRef]
- Sinnemäki, K. Complexity in core argument marking and population size. In Language Complexity as an Evolving Variable; Sampson, G., Gil, D., Trudgill, P., Eds.; Oxford University Press: New York, NY, USA, 2009; pp. 126–140. [Google Scholar]
- Bentz, C.; Verkerk, A.; Kiela, D.; Hill, F.; Buttery, P. Adaptive Communication: Languages with More Non-Native Speakers Tend to Have Fewer Word Forms. PLoS ONE 2015, 10, e0128254. [Google Scholar] [CrossRef]
- Koplenig, A.; Wolfer, S.; Meyer, P. A large quantitative analysis of written language challenges the idea that all languages are equally complex. Sci. Rep. 2023, 13, 15351. [Google Scholar] [CrossRef]
- Koplenig, A.; Wolfer, S. Languages with more speakers tend to be harder to (machine-) learn. Sci. Rep. 2023, 13, 18521. [Google Scholar] [CrossRef]
- Roberts, S.G. Robust, causal, and incremental approaches to investigating linguistic adaptation. Front. Psychol. 2018, 9, 166. [Google Scholar] [CrossRef]
- Cameron, D. The commodification of language: English as a global commodity. In The Oxford Handbook of the History of English; Nevalainen, T., Traugott Closs, E., Eds.; Oxford University Press: New York, NY, USA, 2012; pp. 352–362. [Google Scholar]
- Tupas, R. Unequal Englishes as a sociolinguistics of globalization. J. Engl. Stud. Comp. Lit. 2019, 18, 1–17. [Google Scholar]
- Nichols, J. Linguistic complexity: A comprehensive definition and survey. In Language Complexity as an Evolving Variable; Sampson, G., Gil, D., Trudgill, P., Eds.; Oxford University Press: Oxford, UK, 2009; pp. 64–79. [Google Scholar]
- Davies, M. Corpus of Global Web-Based English. 2013. Available online: https://www.english-corpora.org/glowbe/ (accessed on 11 August 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ehret, K. Assessing the Role of Socio-Demographic Triggers on Kolmogorov-Based Complexity in Spoken English Varieties. Entropy 2025, 27, 1009. https://doi.org/10.3390/e27101009
Ehret K. Assessing the Role of Socio-Demographic Triggers on Kolmogorov-Based Complexity in Spoken English Varieties. Entropy. 2025; 27(10):1009. https://doi.org/10.3390/e27101009
Chicago/Turabian StyleEhret, Katharina. 2025. "Assessing the Role of Socio-Demographic Triggers on Kolmogorov-Based Complexity in Spoken English Varieties" Entropy 27, no. 10: 1009. https://doi.org/10.3390/e27101009
APA StyleEhret, K. (2025). Assessing the Role of Socio-Demographic Triggers on Kolmogorov-Based Complexity in Spoken English Varieties. Entropy, 27(10), 1009. https://doi.org/10.3390/e27101009