Comparative Analysis of Preference in Contemporary and Earlier Texts Using Entropy Measures
Abstract
:1. Introduction
2. Data and Methods
2.1. The Jena Corpus of Expository and Fictional Prose
2.2. The Jena Corpus of Contemporary Expository and Fictional Prose
2.3. Properties Underlying Textual Structure
2.4. Approximate Entropy and Shannon Entropy
3. Results
3.1. Statistical Analysis of Features
3.2. Classification
4. Discussion and Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Kao, J.; Jurafsky, D. A Computational Analysis of Style, Affect, and Imagery in Contemporary Poetry. In Proceedings of the Workshop on Computational Linguistics for Literature; The Association for Computer Linguistics: Montréal, QC, Canada, 2012; pp. 8–17. [Google Scholar]
- Ashok, V.; Feng, S.; Choi, Y. Success with Style: Using Writing Style to Predict the Success of Novels. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 18–21 October 2013; pp. 1753–1764. [Google Scholar]
- Maharjan, S.; Arevalo, J.; Montes, M.; González, F.; Solorio, T. A Multi-task Approach to Predict Likability of Books. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain, 3–7 April 2017; pp. 1217–1227. [Google Scholar] [CrossRef]
- Febres, G.; Jaffe, K. Quantifying Structure Differences in Literature Using Symbolic Diversity and Entropy Criteria. J. Quant. Linguist. 2017, 24, 16–53. [Google Scholar] [CrossRef] [Green Version]
- Maharjan, S.; Kar, S.; Montes, M.; González, F.A.; Solorio, T. Letting Emotions Flow: Success Prediction by Modeling the Flow of Emotions in Books. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, LA, USA, 1–6 June 2018; pp. 259–265. [Google Scholar] [CrossRef]
- Bizzoni, Y.; Nielbo, K.; Thomsen, M. Fractality of sentiment arcs for literary quality assessment: The case of Nobel laureates. In Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities—NLP4DH 2022, Taipei, Taiwan, 21–24 November 2022. [Google Scholar]
- Mohseni, M.; Redies, C.; Gast, V. Approximate Entropy in Canonical and Non-Canonical Fiction. Entropy 2022, 24, 278. [Google Scholar] [CrossRef] [PubMed]
- Palmer, S.E.; Schloss, K.B.; Sammartino, J. Visual Aesthetics and Human Preference. Annu. Rev. Psychol. 2013, 64, 77–107. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Guillory, J. Canonical and Non-Canonical: A Critique of the Current Debate. ELH 1987, 54, 483–527. [Google Scholar] [CrossRef]
- Tötösy de Zepetnek, S. Toward a Theory of Cumulative Canon Formation: Readership in English Canada. Mosaic 1994, 27, 107–119. [Google Scholar]
- Underwood, T.; Sellers, J. The Long Durée of Literary Prestige. Mod. Lang. Q. 2016, 77, 321–344. [Google Scholar] [CrossRef]
- Brachmann, A.; Redies, C. Computational and Experimental Approaches to Visual Aesthetics. Front. Comput. Neurosci. 2017, 11, 102. [Google Scholar] [CrossRef] [Green Version]
- Bloom, H. The Western Canon: The Books and School of the Ages; Harcourt: New York, NY, USA, 1994. [Google Scholar]
- Green, C. Introducing the Corpus of the Canon of Western Literature: A Corpus for Culturomics and Stylistics. Lang. Lit. 2017, 26, 282–299. [Google Scholar] [CrossRef]
- Mohseni, M.; Gast, V.; Redies, C. Fractality and Variability in Canonical and Non-Canonical English Fiction and in Non-Fictional Texts. Front. Psychol. 2021, 12, 920. [Google Scholar] [CrossRef]
- Even-Zohar, I. Polysystem Studies. Poet. Today 1990, 11, 9–26. [Google Scholar] [CrossRef]
- Yucesoy, B.; Wang, X.; Huang, J.; Barabási, A.L. Success in books: A big data approach to bestsellers. EPJ Data Sci. 2018, 7, 1–25. [Google Scholar] [CrossRef] [Green Version]
- Wang, X.; Yucesoy, B.; Varol, O.; Eliassi-Rad, T.; Barabasi, A.L. Success in books: Predicting book sales before publication. EPJ Data Sci. 2019, 8, 31. [Google Scholar] [CrossRef] [Green Version]
- Vasyliuk, A.; Matseliukh, Y.; Batiuk, T.; Luchkevych, M.; Shakleina, I.; Harbuzynska, H.; Kondratiuk, S.; Zelenska, K. Intelligent Analysis of Best-Selling Books Statistics on Amazon. In Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Systems (COLINS 2022), Gliwice, Poland, 12–13 May 2022; Volume 3171, CEUR Workshop Proceedings. pp. 1432–1462. [Google Scholar]
- Pfister, M. Das Drama: Theorie und Analyse; utb GmbH: München, Germany, 1988. [Google Scholar]
- Genette, G. Narrative Discourse: An Essay in Method; Cornell University Press: New York, NY, USA, 1983; Volume 3. [Google Scholar]
- Smith, C. Modes of Discourse. The Local Structure of Texts; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Biber, D. Variation across Speech and Writing; Cambridge University Press: Cambridge, UK, 1991. [Google Scholar]
- Biber, D. Dimensions of Register Variation. A Cross-Linguistic Comparison; Cambridge University Press: Cambridge, UK, 1995. [Google Scholar]
- Biber, D.; Conrad, S. Register, Genre, and Style; Cambridge University Press: Cambridge, UK, 2019. [Google Scholar]
- Egbert, J.; Mahlberg, M. Fiction—One Register or Two? Speech and Narration in Novels. Regist. Stud. 2020, 2, 72–101. [Google Scholar] [CrossRef] [Green Version]
- Simonton, D.K. Lexical Choices and Aesthetic Success: A Computer Content Analysis of 154 Shakespeare Sonnets. Comput. Humanit. 1990, 24, 251–264. [Google Scholar] [CrossRef]
- Forsythe, A.; Nadal, M.; Sheehy, N.; Cela-Conde, C.J.; Sawey, M. Predicting beauty: Fractal dimension and visual complexity in art. Br. J. Psychol. 2011, 102, 49–70. [Google Scholar] [CrossRef] [Green Version]
- Bizzoni, Y.; Peura, T.; Thomsen, M.R.; Nielbo, K. Sentiment Dynamics of Success: Fractal Scaling of Story Arcs Predicts Reader Preferences. In Proceedings of the Workshop on Natural Language Processing for Digital Humanities; NLP Association of India (NLPAI): Silchar, India, 2021; pp. 1–6. [Google Scholar]
- Gold, B.P.; Pearce, M.T.; Mas-Herrero, E.; Dagher, A.; Zatorre, R.J. Predictability and Uncertainty in the Pleasure of Music: A Reward for Learning? J. Neurosci. 2019, 39, 9397–9409. [Google Scholar] [CrossRef]
- Koelsch, S.; Vuust, P.; Friston, K. Predictive Processes and the Peculiar Case of Music. Trends Cogn. Sci. 2019, 23, 63–77. [Google Scholar] [CrossRef] [Green Version]
- Zipf, G.K. Human Behavior and the Principle of Least Effort; Addison-Wesley Press: Cambridge, MA, USA, 1949. [Google Scholar]
- Ferrer i Cancho, R.; Solé, R. Least Effort and the Origins of Scaling in Human Language. Proc. Natl. Acad. Sci. USA 2003, 100, 788–791. [Google Scholar] [CrossRef] [Green Version]
- Forsyth, R.S. Pops and Flops: Some Properties of Famous English Poems. Empir. Stud. Arts 2000, 18, 49–67. [Google Scholar] [CrossRef] [Green Version]
- Chang, M.C.; Yang, A.C.C.; Stanley, H.E.; Peng, C.K. Measuring Information-Based Energy and Temperature of Literary Texts. Phys. A Stat. Mech. Its Appl. 2017, 468, 783–789. [Google Scholar] [CrossRef]
- Qi, P.; Zhang, Y.; Zhang, Y.; Bolton, J.; Manning, C.D. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations; Association for Computational Linguistics: Florence, Italy, 2020; pp. 101–108. [Google Scholar] [CrossRef]
- Stanza: Available Models & Languages. Available online: https://stanfordnlp.github.io/stanza/available_models.html (accessed on 1 March 2023).
- Schneider, G.; Hundt, M.; Oppliger, R. Part-Of-Speech in Historical Corpora: Tagger Evaluation and Ensemble Systems on ARCHER. In Proceedings of the 13th Conference on Natural Language Processing, KONVENS 2016, Bochum, Germany, 19–21 September 2016; Volume 16. Bochumer Linguistische Arbeitsberichte. [Google Scholar]
- Pincus, S.M. Approximate Entropy as a Measure of System Complexity. Proc. Natl. Acad. Sci. USA 1991, 88, 2297–2301. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, X.; Cui, S.; Voss, L. Using Permutation Entropy to Measure the Electroencephalographic Effects of Sevoflurane. Anesthesiology 2008, 109, 448–456. [Google Scholar] [CrossRef] [PubMed]
- Hayashi, K.; Shigemi, K.; Sawa, T. Neonatal Electroencephalography Shows Low Sensitivity to Anesthesia. Neurosci. Lett. 2012, 517, 87–91. [Google Scholar] [CrossRef] [PubMed]
- Lee, G.; Fattinger, S.; Mouthon, A.L.; Noirhomme, Q.; Huber, R. Electroencephalogram Approximate Entropy Influenced by Both Age and Sleep. Front. Neuroinform. 2013, 7, 33. [Google Scholar] [CrossRef] [Green Version]
- Zar, J.H. Biostatistical Analysis, 5th ed.; Pearson: Upper Saddle River, NJ, USA, 2010. [Google Scholar]
- Dietterich, T.G. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput. 1998, 10, 1895–1923. [Google Scholar] [CrossRef] [Green Version]
- Gast, V.; Wehmeier, C.; Vanderbeke, D. A Register-Based Study of Interior Monologue in James Joyce’s Ulysses. Literature 2023, 3, 42–65. [Google Scholar] [CrossRef]
- Moore, A.T.; Schwitzgebel, E. The experience of reading. Conscious. Cogn. 2018, 62, 57–68. [Google Scholar] [CrossRef]
- Belfi, A.M.; Vessel, E.A.; Starr, G.G. Individual ratings of vividness predict aesthetic appeal in poetry. Psychol. Aesthet. Creat. Arts 2018, 12, 341. [Google Scholar] [CrossRef]
- Pițur, S.; Miu, A.C. Poetry-elicited emotions: Reading experience and psychological mechanisms. Psychol. Aesthet. Creat. Arts 2022. [Google Scholar] [CrossRef]
- Scharinger, M.; Wagner, V.; Knoop, C.; Menninghaus, W. Melody in poems and songs: Fundamental statistical properties predict aesthetic evaluation. Psychol. Aesthet. Creat. Arts 2022. [Google Scholar] [CrossRef]
- Roeske, T.C.; Kelty-Stephen, D.; Wallot, S. Multifractal analysis reveals music-like dynamic structure in songbird rhythms. Sci. Rep. 2018, 8, 4570. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Miles, S.A.; Rosen, D.S.; Grzywacz, N.M. A Statistical Analysis of the Relationship between Harmonic Surprise and Preference in Popular Music. Front. Hum. Neurosci. 2017, 11, 263. [Google Scholar] [CrossRef] [Green Version]
- Redies, C.; Brachmann, A.; Wagemans, J. High Entropy of Edge Orientations Characterizes Visual Artworks From Diverse Cultural Backgrounds. Vis. Res. 2017, 133, 130–144. [Google Scholar] [CrossRef] [PubMed]
- Geller, H.A.; Bartho, R.; Thömmes, K.; Redies, C. Statistical image properties predict aesthetic ratings in abstract paintings created by neural style transfer. Front. Neurosci. 2022, 16, 999720. [Google Scholar] [CrossRef] [PubMed]
- Mather, G. Visual Image Statistics in the History of Western Art. Art Percept. 2018, 6, 97–115. [Google Scholar] [CrossRef]
- Redies, C.; Brachmann, A. Statistical Image Properties in Large Subsets of Traditional Art, Bad Art, and Abstract Art. Front. Neurosci. 2017, 11, 593. [Google Scholar] [CrossRef] [PubMed]
- Chamberlain, R. The Interplay of Objective and Subjective Factors in Empirical Aesthetics. In Human Perception of Visual Information: Psychological and Computational Perspectives; Ionescu, B., Bainbridge, W.A., Murray, N., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 115–132. [Google Scholar] [CrossRef]
- Kumar, A.; Lease, M.; Baldridge, J. Supervised Language Modeling for Temporal Resolution of Texts. In CIKM’11, Proceedings of the 20th ACM International Conference on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2011; pp. 2069–2072. [Google Scholar] [CrossRef] [Green Version]
- Garcia-Fernandez, A.; Ligozat, A.L.; Dinarelli, M.; Bernhard, D. When Was It Written? Automatically Determining Publication Dates. In SPIRE’11, Proceedings of the 18th International Conference on String Processing and Information Retrieval; Springer: Berlin/Heidelberg, Germany, 2011; pp. 221–236. [Google Scholar]
- Ciobanu, A.M.; Dinu, L.P.; Şulea, O.M.; Dinu, A.; Niculae, V. Temporal Text Classification for Romanian Novels set in the Past. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013; INCOMA Ltd. Shoumen, BULGARIA: Hissar, Bulgaria, 2013; pp. 136–140. [Google Scholar]
- Štajner, S.; Zampieri, M. Stylistic Changes for Temporal Text Classification. Lect. Notes Comput. Sci. 2013, 8082, 519–526. [Google Scholar] [CrossRef]
- Gómez-Adorno, H.; Posadas-Duran, J.P.; Ríos-Toledo, G.; Sidorov, G.; Sierra, G. Stylometry-based approach for detecting writing style changes in literary texts. Comput. Sist. 2018, 22, 47–53. [Google Scholar] [CrossRef]
- Efremova, J.; García, A.M.; Zhang, J.; Calders, T. Effects of evolutionary linguistics in text classification. In Proceedings of the International Conference on Statistical Language and Speech Processing; Springer: Berlin/Heidelberg, Germany, 2015; pp. 50–61. [Google Scholar]
- Liebeskind, C.; Liebeskind, S. Deep Learning for Period Classification of Historical Hebrew Texts. J. Data Min. Digit. Humanit. 2020, 2020. [Google Scholar] [CrossRef]
- Gopidi, A.; Alam, A. Computational Analysis of the Historical Changes in Poetry and Prose. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change; Association for Computational Linguistics: Florence, Italy, 2019; pp. 14–22. [Google Scholar] [CrossRef] [Green Version]
- Lagutina, K.; Poletaev, A.; Lagutina, N.; Boychuk, E.; Paramonov, I. Automatic Extraction of Rhythm Figures and Analysis of Their Dynamics in Prose of 19th-21st Centuries. In Proceedings of the 2020 26th Conference of Open Innovations Association (FRUCT), Yaroslavl, Russia, 20–24 April 2020; pp. 247–255. [Google Scholar] [CrossRef]
- Lagutina, K.V.; Manakhova, A.M. Automated Search and Analysis of the Stylometric Features That Describe the Style of the Prose of 19th–21st Centuries. Autom. Control Comput. Sci. 2021, 55, 866–876. [Google Scholar] [CrossRef]
- Degaetano-Ortlieb, S. Stylistic Variation Over 200 Years of Court Proceedings According to Gender and Social Class. In Proceedings of the Second Workshop on Stylistic Variation; Association for Computational Linguistics: New Orleans, LA, USA, 2018; pp. 1–10. [Google Scholar] [CrossRef]
- Fankhauser, P.; Knappen, J.; Teich, E. Topical Diversification over Time in the Royal Society Corpus; Jagiellonian University; Pedagogical University: Kraków, 2016. Digital Humanities. Available online: https://ids-pub.bsz-bw.de/frontdoor/index/index/year/2016/docId/5474 (accessed on 1 March 2023).
- Bizzoni, Y.; Degaetano-Ortlieb, S.; Fankhauser, P.; Teich, E. Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach. Front. Artif. Intell. 2020, 3. [Google Scholar] [CrossRef] [PubMed]
- Wang, G.; Wang, H.; Sun, X.; Nan, W.; Wang, L. Linguistic complexity in scientific writing: A large-scale diachronic study from 1821 to 1920. Scientometrics 2022, 128, 441–460. [Google Scholar] [CrossRef]
- Krielke, M.P.; Fischer, S.; Degaetano-Ortlieb, S.; Teich, E. System and use of wh-relativizers in 200 years of English scientific writing. In Proceedings of the 10th International Corpus Linguistics Conference, Cardiff, Wales, UK, 23–27 July 2019. [Google Scholar]
- US Novel Corpus. Available online: https://textual-optics-lab.uchicago.edu/us_novel_corpus (accessed on 1 March 2023).
- Degaetano-Ortlieb, S.; Strötgen, J. Diachronic Variation of Temporal Expressions in Scientific Writing Through the Lens of Relative Entropy. In Language Technologies for the Challenges of the Digital Age; Rehm, G., Declerck, T., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 259–275. [Google Scholar]
Category | Number of Texts | Length () |
---|---|---|
Canonical (preferred) | 76 | 199 ± 96 |
Non-Canonical (non-preferred) | 130 | 111 ± 56 |
Non-Fictional | 185 | 171 ± 178 |
Category | Number of Texts | Length () |
---|---|---|
Bestseller | 93 | 153 ± 90 |
Non-Bestseller | 110 | 105 ± 39 |
Non-Fictional | 122 | 142 ± 84 |
Contemporary | Earlier | |||
---|---|---|---|---|
Text Property | Bestseller | Non-Bestseller | Canonical | Non-Canonical |
Sentence Length | 1.99 (1.95, 2.02) | 2.01 (1.99, 2.04) ns | 1.86 (1.83, 1.89) | 1.87 (1.86, 1.90) ns |
Noun | 1.93 (1.921, 1.934) | 1.85 (1.84, 1.86) *** | 1.89 (1.88, 1.91) | 1.83 (1.81, 1.84) *** |
Verb | 1.74 (1.730, 1.742) | 1.70 (1.68, 1.71) *** | 1.75 (1.73, 1.76) | 1.70 (1.69, 1.71) *** |
Adjective | 1.40 (1.38, 1.41) | 1.36 (1.34, 1.38) ** | 1.50 (1.49, 1.52) | 1.45 (1.43, 1.48) *** |
Adverb | 1.50 (1.47, 1.53) | 1.51 (1.50, 1.52) ns | 1.51 (1.49, 1.53) | 1.48 (1.46, 1.49) ** |
Pronoun | 1.71 (1.69, 1.73) | 1.73 (1.71, 1.74) * | 1.74 (1.71, 1.76) | 1.681 (1.675, 1.691) *** |
Preposition | 1.63 (1.62, 1.64) | 1.61 (1.60, 1.62) *** | 1.71 (1.70, 1.72) | 1.67 (1.66, 1.68) *** |
Contemporary | Earlier | |||
---|---|---|---|---|
Text Property | Bestseller | Non-Bestseller | Canonical | Non-Canonical |
Sentence Length | 3.42 (3.39, 3.46) | 3.36 (3.31, 3.39) ns | 3.96 (3.88, 4.05) | 3.96 (3.87, 4.08) ns |
Noun | 2.09 (2.08, 2.11) | 1.99 (1.77, 2.02) *** | 2.00 (1.99, 2.02) | 1.97 (1.95, 1.98) *** |
Verb | 1.80 (1.78, 1.81) | 1.77 (1.767, 1.789) *** | 1.80 (1.79, 1.81) | 1.777 (1.772, 1.783) *** |
Adjective | 1.43 (1.41, 1.45) | 1.39 (1.37, 1.42) ** | 1.54 (1.53, 1.55) | 1.49 (1.47, 1.53) *** |
Adverb | 1.53 (1.49, 1.56) | 1.54 (1.53, 1.57) ns | 1.54 (1.51, 1.55) | 1.51 (1.49, 1.53) * |
Pronoun | 1.80 (1.79, 1.81) | 1.82 (1.81, 1.84) *** | 1.83 (1.80, 1.84) | 1.78 (1.77, 1.79) *** |
Preposition | 1.67 (1.66, 1.68) | 1.66 (1.64, 1.67) * | 1.75 (1.74, 1.77) | 1.73 (1.72, 1.74) *** |
Bestselling vs. Non-Bestselling | Canonical vs. Non-Canonical | |||
---|---|---|---|---|
ApEn | ShEn | ApEn | ShEn | |
Sentence Length | 53.6 ± 3.1 | 53.8 ± 3.0 | 54.0 ± 1.6 | 50.0 ± 1.0 † |
Noun | 80.4 ± 3.4 | 72.9 ± 2.7 | 73.6 ± 2.9 | 60.0 ± 4.5 |
Verb | 67.7 ± 3.7 | 62.7 ± 2.5 | 71.3 ± 3.4 | 56.2 ± 3.8 |
Adjective | 56.2 ± 3.2 | 57.4 ± 3.3 | 55.2 ± 2.5 | 51.5 ± 2.7 † |
Adverb | 53.6 ± 2.2 | 51.3 ± 2.6 † | 51.6 ± 1.4 † | 51.0 ± 1.5 † |
Pronoun | 57.6 ± 1.8 | 58.1 ± 1.9 | 68.0 ± 1.7 | 63.8 ± 1.8 |
Preposition | 57.8 ± 2.6 | 53.5 ± 2.2 | 69.1 ± 2.4 | 59.7 ± 1.7 |
All | 79.4 ± 4.2 | 77.6 ± 2.4 | 77.3 ± 2.6 | 68.5 ± 2.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mohseni, M.; Redies, C.; Gast, V. Comparative Analysis of Preference in Contemporary and Earlier Texts Using Entropy Measures. Entropy 2023, 25, 486. https://doi.org/10.3390/e25030486
Mohseni M, Redies C, Gast V. Comparative Analysis of Preference in Contemporary and Earlier Texts Using Entropy Measures. Entropy. 2023; 25(3):486. https://doi.org/10.3390/e25030486
Chicago/Turabian StyleMohseni, Mahdi, Christoph Redies, and Volker Gast. 2023. "Comparative Analysis of Preference in Contemporary and Earlier Texts Using Entropy Measures" Entropy 25, no. 3: 486. https://doi.org/10.3390/e25030486
APA StyleMohseni, M., Redies, C., & Gast, V. (2023). Comparative Analysis of Preference in Contemporary and Earlier Texts Using Entropy Measures. Entropy, 25(3), 486. https://doi.org/10.3390/e25030486