Identifying Literary Microgenres and Writing Style Differences in Romanian Novels with ReaderBench and Large Language Models
Abstract
1. Introduction
1.1. Definition of Literary Genres and Microgenres
1.2. Analysis of Literary (Micro)Genres
1.3. Current Study Objective
1.3.1. Macro-Level Genre Characterization
1.3.2. Microgenre Distribution Analysis
2. Current Study Contributions
- We release a publicly available corpus of 175 Romanian novels written by 106 authors. The dataset can be accessed at https://huggingface.co/datasets/upb-nlp/lumro_175_novels/ (accessed on 9 July 2025). The LUMRO corpus also includes and revises the ELTeC-rom corpus (https://github.com/COST-ELTeC/ELTeC-rom, accessed on 9 July 2025), to which it adds 75 more novels.
- We present a detailed analysis of the distinctions in literary genres based on the top discriminative linguistic features, as well as examine the mixtures of literary microgenres within novels. We present our method for microgenre classification, and we release our code as open-source on GitHub. The code can be accessed at https://github.com/upb-nlp/LUMRO (version v1.0.0, accessed on 9 July 2025).
3. Method
3.1. Corpus
3.2. Genre Classification
3.2.1. Linguistic Features
3.2.2. LLMs
- g: a specific genre
- : number of words in paragraph i
- : total number of words in the chapter
- : probability assigned to genre g in paragraph i
- n: number of paragraphs in the chapter
4. Results
4.1. ReaderBench Linguistic Features
4.1.1. Document Level Analysis
4.1.2. Chapter-Level Analysis
4.2. Microgenre Prediction with LLMs
5. Discussion
Romanian text: “El remase apoi schilod în toată viața lui. Dar înfine el era acuma scăpat și’și redobândise libertatea. Ca o pasăre scăpată din cușcă alergă acasă cu speranța de a’și găsi soția și copiii. Ușa era închisă, obloanele lăsate în jos, ca în zâua când plecase; niciun vecin nu văzuse pe nevasta lui, pe copiii sei și nici nu știa ce s’au făcut. Șloime Haies, călit la durere, scoase un oftat și tărăndu’și piciorul, chemă un lăcătuș, deschise ușa, își scoase banii, și cu durere în piept, cu o nedumerire îngrozitoare în suflet, plecă spre Tărgu-Ocna în căutarea familiei sale.“—Gângavul, Elias Schwarzfeld, 1895
English translation: “He remained lame for the rest of his life. But finally, he was now free and had regained his liberty. Like a bird escaped from its cage, he ran home hoping to find his wife and children. The door was closed, the shutters lowered, as on the day he had left; no neighbor had seen his wife or his children, nor did anyone know what had become of them. Șloime Haies, hardened by suffering, let out a sigh and, dragging his foot, called a locksmith, opened the door, took out his money, and with pain in his chest and terrible bewilderment in his soul, set off for Tărgu-Ocna in search of his family.“
Every text participates in one or several genres, there is no genreless text; there is always a genre and genres, yet such participation never amounts to belonging.
6. Conclusions and Future Work
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Correction Statement
Abbreviations
| FDR | False Discovery Rate |
| JSON | JavaScript Object Notation |
| Llama | Large Language Model Meta AI |
| LLM | Large Language Model |
| NLP | Natural Language Processing |
| POS | Part of Speech |
Appendix A. Plots of Significant Linguistic Feature Values Across Genres


Appendix B. Mann–Whitney U Test Results


Appendix C. Structured Output Schema
| Listing A1. Output schema for genre probabilities. |
![]() |
Appendix D. Full System Prompt Used in Genre Prediction by LLM
| System Prompt |
|---|
You are a literature researcher and you need to specify the literary genre of a fragment from a Romanian novel. These are the possibilities:
|
| Examples of correct classification: |
Example 1:
|
| Correctly generated genres and probabilities: {“genres_with_probabilities”: [ “genre”: “outlaw”, “probability”: “0.6”, “genre”: “adventure”, “probability”: “0.3”, “genre”: “historical”, “probability”: “0.1”]} |
Example 2:
|
| Correctly generated genres and probabilities: {“genres_with_probabilities”: [ “genre”: “science fiction”, “probability”: 0.7, “genre”: “adventure”, “probability”: 0.2, “genre”: “historical”, “probability”: 0.1]} |
Example 3:
|
| Correctly generated genres and probabilities: {“genres_with_probabilities”: [ “genre”: “rural”, “probability”: 0.8, “genre”: “social”, “probability”: 0.15, “genre”: “poetic”, “probability”: 0.05]} |
Appendix E. Examples of Genre Distribution

half of the novel slowly and patiently chronicles the awakening, development, and eruption of love in a girl conscious of her beauty and charms, first provocative and undecided, then mastered by passion and capable of any sacrifice.



References
- Krieger, M. Theory of Criticism; Johns Hopkins University Press: Baltimore, MD, USA, 1976. [Google Scholar]
- Aristotle, B. Poetics; Janko, R., Translator; Hackett Publishing Company: Indianapolis, IN, USA; Cambridge, MA, USA, 2019. [Google Scholar]
- Underwood, T. Genre Theory and Historicism. J. Cult. Anal. 2016, 2, 1–6. [Google Scholar] [CrossRef]
- Todorov, T. Genres in Discourse; Porter, C., Translator; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
- Genette, G. The Architext: An Introduction; University of California Press: Berkeley, CA, USA; Los Angeles, CA, USA; Oxford, CA, USA, 1992. [Google Scholar]
- Derrida, J. The law of genre. Crit. Inq. 1980, 7, 55–81. [Google Scholar] [CrossRef]
- Ivanov, V. Eteroglossia/Heteroglossia. In Culture e Discorso. Un Lessico per le Scienze Umane, a Cura di Alessandro Duranti; Meltemi Editore: Rome, Italy, 2002; pp. 107–110. [Google Scholar]
- Ivanov, V.V. Bakhtin’s theory of language from the standpoint of modern science. Russ. J. Commun. 2008, 1, 245–265. [Google Scholar] [CrossRef]
- Jauss, H.R. Literary History as a Challenge to Literary Theory. New Lit. Hist. 1970, 2, 7–37. [Google Scholar] [CrossRef]
- Fish, S. Is There a Text in This Class? The Authority of Interpretive Communities; Harvard University Press: Cambridge, MA, USA, 1980. [Google Scholar]
- Fowler, A. Kinds of Literature: An Introduction to the Theory of Genres and Modes; Harvard University Press: Cambridge, MA, USA, 1982. [Google Scholar]
- Moretti, F. The Novel, 1. History, Geography, Culture; Princeton University Press: Princeton, NJ, USA, 2005. [Google Scholar]
- Prince, G. The diary novel: Notes for the definition of a sub-genre. Neophilologus 1975, 59, 477–481. [Google Scholar] [CrossRef]
- Duff, D. Modern Genre Theory; Routledge: London, UK, 2014. [Google Scholar]
- Yang, D. A Review of The Microgenre: A Quick Look at Small Culture. Interdiscip. Lit. Stud. 2023, 25, 265–269. [Google Scholar] [CrossRef]
- Walz, K. The Graduate Student Novel: A New Subgenre in University Fiction. Ph.D. Thesis, University of Missouri, Columbia, MI, USA, 2022. [Google Scholar]
- Stanford Literary Lab. Microgenres. Available online: https://litlab.stanford.edu/projects/microgenres/ (accessed on 24 August 2025).
- Borza, C.; Goldiș, A.; Tudurachi, A. Subgenurile Romanului Românesc. Laboratorul unei tipologii. Dacorom. Litt. 2020, 7, 205–220. [Google Scholar] [CrossRef]
- Terian, A. Principles for an Evolutionary Taxonomy of the Romanian Novel. Transylv. Rev. 2022, 31, 11–24. [Google Scholar]
- Baghiu, Ș. Apartenența multiplă de subgen: O propunere pentru istoria formelor românești. Rev. Transilv. 2022, 11–12, 45–49. [Google Scholar] [CrossRef]
- Schöch, C.; Erjavec, T.; Patras, R.; Santos, D. Creating the european literary text collection (eltec): Challenges and perspectives. Mod. Lang. Open 2021, 25, 1–19. [Google Scholar] [CrossRef]
- Patras, R. Romanian Novel Corpus (ELTeC-rom): Release with 80 novels encoded at level 1. In European Literary Text Collection; Zenodo: Genève, Switzerland, 2020. [Google Scholar] [CrossRef]
- Patras, R.; Odebrecht, C.; Galleron, I.; Arias, R.; Herrmann, B.J.; Krstev, C.; Poniž, K.M.; Yesypenko, D. Thresholds to the “Great Unread”: Titling Practices in Eleven ELTeC Collections. Interférences Littéraires/Literaire Interf. 2021, 25, 163–187. [Google Scholar]
- Tudurachi, A. Dicționarul Cronologic al Romanului Românesc de la Origini până în 2000 [Chronological Dictionary of the Romanian Novel from Its Origins to 2000]; Presa Universitară Clujeană: Cluj-Napoca, Romania, 2023; Volume I–II. [Google Scholar]
- Todorov, T. Literary genres. Curr. Trends Linguist. 1974, 12, 957–962. [Google Scholar]
- Hamburger, K. The Logic of Literature; Indiana University Press: Bloomington, IN, USA, 1973. [Google Scholar]
- Beebee, T.O. The Ideology of Genre: A Comparative Study of Generic Instability; Penn State Press: University Park, PA, USA, 1994. [Google Scholar]
- Bawarshi, A. The Genre Function. Coll. Engl. 2000, 62, 335. [Google Scholar] [CrossRef]
- Frow, J. Genre; Routledge: London, UK, 2014. [Google Scholar]
- Cohen, R. Genre Theory and Historical Change: Theoretical Essays of Ralph Cohen; University of Virginia Press: Richmond, VA, USA, 2017. [Google Scholar]
- Peng, R.D.; Hengartner, N.W. Quantitative analysis of literary styles. Am. Stat. 2002, 56, 175–185. [Google Scholar] [CrossRef]
- Nichols, R.; Lynn, J.; Purzycki, B.G. Toward a science of science fiction: Applying quantitative methods to genre individuation. Sci. Study Lit. 2014, 4, 25–45. [Google Scholar] [CrossRef]
- Hettinger, L.; Reger, I.; Jannidis, F.; Hotho, A. Classification of Literary Subgenres. In Proceedings of the DHd, Krakow, Poland, 11–16 July 2016. [Google Scholar]
- Herawati, Y.W.; Masitoh, S. Quantifying literary works: Is it possible? J. Ilm. Bhs. Dan Sastra 2024, 11, 40–52. [Google Scholar] [CrossRef]
- Monte-Serrat, D.M.; Machado, M.T.; Ruiz, E.E.S. A machine learning approach to literary genre classification on Portuguese texts: Circumventing NLP’s standard varieties. In Proceedings of the Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (STIL), Online, 2021; Sociedade Brasileira de Computação (SBC): Porto Alegre, RS, Brazil, 2021; pp. 255–264. [Google Scholar]
- Preeti; Sharma, N.; Verma, J.; Latha, R.; Dharanish, J.; Bheemra. Quantitative Analysis of Literary Texts: Computational Approaches in Digital Humanities Research. Educ. Adm. Theory Pract. 2024, 30, 5234–5240. [Google Scholar]
- Moretti, F. Distant Reading; Verso Books: London, UK, 2013. [Google Scholar]
- Moretti, F. Canon/Archive: Studies in Quantitative Formalism from the Stanford Literary Lab; n + 1 Foundation: Brooklyn, NY, USA, 2017. [Google Scholar]
- Ramirez-Arellano, A. Classification of Literary Works: Fractality and Complexity of the Narrative, Essay, and Research Article. Entropy 2020, 22, 904. [Google Scholar] [CrossRef]
- Kok, C.L.; Ho, C.K.; Aung, T.H.; Koh, Y.Y.; Teo, T.H. Transfer learning and deep neural networks for robust intersubject hand movement detection from EEG signals. Appl. Sci. 2024, 14, 8091. [Google Scholar] [CrossRef]
- Kok, C.L.; Ho, C.K.; Chen, L.; Koh, Y.Y.; Tian, B. A novel predictive modeling for student attrition utilizing machine learning and sustainable big data analytics. Appl. Sci. 2024, 14, 9633. [Google Scholar] [CrossRef]
- Unal, F.Z.; Guzel, M.S.; Bostanci, E.; Acici, K.; Asuroglu, T. Multilabel Genre Prediction Using Deep-Learning Frameworks. Appl. Sci. 2023, 13, 8665. [Google Scholar] [CrossRef]
- Devatine, N.; Muller, P.; Braud, C. MELODI at SemEval-2023 Task 3: In-domain Pre-training for Low-resource Classification of News Articles. In Proceedings of the 17th International Workshop on Semantic Evaluation, Toronto, ON, Canada, 9–14 July 2023. [Google Scholar]
- Lepekhin, M.; Sharoff, S. FTD at SemEval-2023 Task 3: News Genre and Propaganda Detection by Comparing Mono- and Multilingual Models with Fine-tuning on Additional Data. In Proceedings of the 17th International Workshop on Semantic Evaluation, Toronto, ON, Canada, 9–14 July 2023. [Google Scholar]
- Jiang, Y. Team QUST at SemEval-2023 Task 3: A Comprehensive Study of Monolingual and Multilingual Approaches for Detecting Online News Genre, Framing and Persuasion Techniques. arXiv 2023, arXiv:2304.04190. [Google Scholar] [CrossRef]
- Münker, S.; Kugler, K.; Rettinger, A. Zero-shot prompt-based classification: Topic labeling in times of foundation models in German Tweets. arXiv 2024, arXiv:2406.18239v1. [Google Scholar]
- Philippy, F.; Haddadan, S.; Guo, S. Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish. arXiv 2024, arXiv:2404.03912. [Google Scholar]
- Baghiu, S. The Rise of Translations: Foreign Novels in Romania in 1877, 1945, and 1989. Transylv. Rev. 2022, 31, 250–260. [Google Scholar]
- Terian, A. Big numbers: A quantitative analysis of the development of the novel in Romania. Transylv. Rev. 2019, 28, 55–74. [Google Scholar]
- Varga, D. ND Popescu și romanele istorice de consum. Rev. Transilv. 2023, 11–12, 39–42. [Google Scholar] [CrossRef]
- Gârdan, D. Evoluţia romanului erotic românesc din prima jumătate a secolului al XX-lea. Intre exerciţiu si canonizare. Rev. Transilv. 2018, 7, 5–10. [Google Scholar]
- Borza, C.; Gârdan, D.; Modoc, E. The peasant and the nation plot: A distant reading of the Romanian rural novel from the first half of the twentieth century. Rural. Hist. 2023, 34, 75–91. [Google Scholar] [CrossRef]
- Kessler, B.; Nunberg, G.; Schütze, H. Automatic Detection of Text Genre. In 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics; Association for Computational Linguistics: Madrid, Spain, 1997; pp. 32–38. [Google Scholar]
- Santini, M. Automatic genre identification: Towards a flexible classification scheme. In Proceedings of the BCS IRSG Symposium: Future Directions in Information Access 2007. BCS Learning & Development, Glasgow, UK, 28–29 August 2007. [Google Scholar]
- Petrenz, P.; Webber, B. Robust cross-lingual genre classification through comparable corpora. In Proceedings of the The 5th Workshop on Building and Using Comparable Corpora, Istanbul, Turkey, 26 May 2012; p. 1. [Google Scholar]
- Maharjan, S.; Montes, M.; González, F.A.; Solorio, T. A genre-aware attention model to improve the likability prediction of books. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 3381–3391. [Google Scholar]
- Goyal, A.; Prem Prakash, V. Statistical and deep learning approaches for literary genre classification. In Advances in Data and Information Sciences: Proceedings of ICDIS 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 297–305. [Google Scholar]
- Terian, A.; Gârdan, D.; Modoc, E.; Borza, C.; Varga, D.; Olaru, O.; Morariu, D. Genurile romanului românesc (1901–1932). O analiză cantitativă. Transilvania 2020, 10, 53–64. [Google Scholar] [CrossRef]
- Patraș, R. Hajduk novels in the nineteenth-century Romanian fiction: Notes on a sub-genre. Swed. J. Rom. Stud. 2019, 2, 24–33. [Google Scholar]
- Ursu, M.G. Romanul misterelor în literatura română a secolului al XIX-lea-o pagină de istorie literară uitată. Swed. J. Rom. Stud. 2022, 5, 69–84. [Google Scholar] [CrossRef]
- Stevens, A.H.; O’Donnell, M.C. The Microgenre: A Quick Look at Small Culture; Bloomsbury Publishing USA: New York, NY, USA, 2020. [Google Scholar]
- Wang, W.; Tu, Z.; Chen, C.; Yuan, Y.; Huang, J.t.; Jiao, W.; Lyu, M.R. All languages matter: On the multilingual safety of large language models. arXiv 2023, arXiv:2310.00905. [Google Scholar] [CrossRef]
- Mihalcea, R.; Ignat, O.; Bai, L.; Borah, A.; Chiruzzo, L.; Jin, Z.; Kwizera, C.; Nwatu, J.; Poria, S.; Solorio, T. Why AI Is WEIRD and Shouldn’t Be This Way: Towards AI for Everyone, with Everyone, by Everyone. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 28657–28670. [Google Scholar]
- Nasution, A.H.; Onan, A. Chatgpt label: Comparing the quality of human-generated and llm-generated annotations in low-resource language nlp tasks. IEEE Access 2024, 12, 71876–71900. [Google Scholar] [CrossRef]
- Zhong, T.; Yang, Z.; Liu, Z.; Zhang, R.; Liu, Y.; Sun, H.; Pan, Y.; Li, Y.; Zhou, Y.; Jiang, H.; et al. Opportunities and challenges of large language models for low-resource languages in humanities research. arXiv 2024, arXiv:2412.04497. [Google Scholar]
- Repede, S.E.; Brad, R. LLaMA 3 vs. State-of-the-Art Large Language Models: Performance in Detecting Nuanced Fake News. Computers 2024, 13, 292. [Google Scholar] [CrossRef]
- Ştefănescu, E.; Jerpelea, A.I. Reddit is all you need: Authorship profiling for Romanian. arXiv 2024, arXiv:2410.09907. [Google Scholar] [CrossRef]
- Dascalu, M.; Dessus, P.; Trausan-Matu, Ş.; Bianco, M.; Nardy, A. ReaderBench, an environment for analyzing text complexity and reading strategies. In Proceedings of the Artificial Intelligence in Education: 16th International Conference, AIED 2013, Memphis, TN, USA, 9–13 July 2013; Proceedings 16. Springer: Berlin/Heidelberg, Germany, 2013; pp. 379–388. [Google Scholar]
- Dascalu, M.; Gîfu, D.; Trausan-Matu, S. What Makes Your Writing Style Unique? Significant Differences Between Two Famous Romanian Orators. In Proceedings of the International Conference on Computational Collective Intelligence, Halkidiki, Greece, 28–30 September 2016. [Google Scholar]
- Allen, L.; Dascalu, M.; McNamara, D.S.; Crossly, S.; Trausan-Matu, S. Modeling individual differences among writers using ReaderBench. In Proceedings of the EDULearn16: 8th International Conference on Education and New Learning Technologies, Barcelona, Spain, 4–6 July 2016; IATED Academy: Valencia, Spain, 2016; pp. 5269–5279. [Google Scholar]
- Chitez, M.; Dascalu, M.; Udrea, A.C.; Strilețchi, C.; Csürös, K.; Rogobete, R.; Oravițan, A. Towards Building the LEMI Readability Platform for Children’s Literature in the Romanian Language. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy, 20–25 May 2024; Calzolari, N., Kan, M.Y., Hoste, V., Lenci, A., Sakti, S., Xue, N., Eds.; pp. 16450–16456. [Google Scholar]
- Gifu, D.; Dascalu, M.; Trausan-Matu, S.; Allen, L.K. Time evolution of writing styles in Romanian language. In Proceedings of the 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), San Jose, CA, USA, 6–8 November 2016; pp. 1048–1054. [Google Scholar]
- Yadav, K.V.; Kumar, R. Data Preprocessing Techniques. Phoenix: Int. Multidiscip. Res. J. 2025, 1, 1–6. [Google Scholar]
- Meetei, L.S.; Singh, T.D.; Borgohain, S.K.; Bandyopadhyay, S. Low resource language specific pre-processing and features for sentiment analysis task. Lang. Resour. Eval. 2021, 55, 947–969. [Google Scholar] [CrossRef]
- Khan, T.; Mallick, D.D.; Khan, M.S.I.; Hasan, M.M.; Ashraf, F.B. An efficient text preprocessing and classification technique for multilingual and transliterated data. In Proceedings of the 2022 25th International Conference on Computer and Information Technology (ICCIT), Cox’s Bazar, Bangladesh, 17–19 December 2022; pp. 366–371. [Google Scholar]
- Vargha, A.; Delaney, H.D. The Kruskal-Wallis Test and Stochastic Homogeneity. J. Educ. Stat. 1998, 23, 170–192. [Google Scholar]
- Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 1995, 57, 289–300. [Google Scholar] [CrossRef]
- Storey, J.D. False Discovery Rate. Int. Encycl. Stat. Sci. 2011, 1, 504–508. [Google Scholar]
- Pilnenskiy, N.; Smetannikov, I. Feature selection algorithms as one of the python data analytical tools. Future Internet 2020, 12, 54. [Google Scholar] [CrossRef]
- Tariq, M.A. A Study on Comparative Analysis of Feature Selection Algorithms for Students Grades Prediction. J. Inf. Organ. Sci. 2024, 48, 1–15. [Google Scholar] [CrossRef]
- McKnight, P.E.; Najab, J. Mann-Whitney U Test. In The Corsini Encyclopedia of Psychology; Wiley: Hoboken, NJ, USA, 2010; p. 1. [Google Scholar]
- Grattafiori, A.; Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Vaughan, A.; et al. The Llama 3 Herd of Models. arXiv 2024, arXiv:2407.21783. [Google Scholar] [CrossRef]
- Guo, D.; Yang, D.; Zhang, H.; Song, J.; Zhang, R.; Xu, R.; Zhu, Q.; Ma, S.; Wang, P.; Bi, X.; et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv 2025, arXiv:2501.12948. [Google Scholar]
- Manning, C.D. An Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Jockers, M.L. Macroanalysis: Digital Methods and Literary History; University of Illinois Press: Champaign, IL, USA, 2013. [Google Scholar]
- Olteanu, A. The Romantic Historicism and the risse of the Historical Novel in the 19 (tm) century Romanian Literature. Philobiblon 2024, 29, 343–361. [Google Scholar] [CrossRef]








| Genre | Number of Novels | Percentage (%) | Number of Chapters | Percentage (%) |
|---|---|---|---|---|
| Social | 54 | 30.86 | 1034 | 35.05 |
| Sentimental | 33 | 18.86 | 510 | 17.26 |
| Historical | 23 | 13.14 | 527 | 17.86 |
| Murder | 13 | 7.43 | 335 | 11.35 |
| Mystery | 13 | 7.43 | 404 | 13.69 |
| Outlaw | 12 | 6.86 | 186 | 6.31 |
| War | 11 | 6.29 | 158 | 5.35 |
| Genres below this line were excluded from the document-level analysis | ||||
| Rural | 4 | 2.29 | 137 | 4.64 |
| Psychology | 3 | 1.71 | 69 | 2.34 |
| Science Fiction | 3 | 1.71 | 44 | 1.49 |
| Sensation | 2 | 1.14 | 96 | 3.25 |
| Poetic | 2 | 1.14 | 17 | 0.58 |
| Philosophical | 1 | 0.57 | 14 | 0.47 |
| Religious | 1 | 0.57 | 26 | 0.88 |
| Biographical | 1 | 0.57 | 3 | 0.10 |
| Adventure | 1 | 0.57 | 13 | 0.44 |
| Exile | 1 | 0.57 | 10 | 0.34 |
| Total | 175 | 100.00 | 2959 | 100.00 |
| Parameter | Value |
|---|---|
| Model | Llama3.3:70B |
| Stream | False |
| Temperature | 0 |
| Top-p | 1 |
| Frequency penalty | 0.0 |
| Presence penalty | 0.0 |
| Messages | System: system_message, User: user_input |
| Response format | response_schema |
| Feature | Description | H-Value | p-Value |
|---|---|---|---|
| Features related to Syntax | |||
| M (Dep_compound/Sent) | Mean of compound dependencies per sentence | 44.436 | <0.001 |
| Max (Dep_nummod/Par) | Maximum of numerical modifiers per paragraph | 36.393 | 0.003 |
| Max (Dep_nummod/Sent) | Maximum of numerical modifiers per sentence | 32.464 | 0.009 |
| Max (Dep_expl/Sent) | Maximum of expletive dependencies per sentence | 29.817 | 0.019 |
| M (Dep_goeswith/Sent) | Mean of goes-with dependencies per sentence | 29.446 | 0.021 |
| Max (Dep_ccomp/Sent) | Maximum of clausal complements per sentence | 29.231 | 0.022 |
| Max (Dep_goeswith/Sent) | Maximum of goes-with dependencies per sentence | 29.156 | 0.022 |
| Max (Dep_vocative/Sent) | Maximum of vocative dependencies per sentence | 28.241 | 0.029 |
| Max (Dep_dep/Sent) | Maximum of dependency relations per sentence | 28.215 | 0.029 |
| M (Dep_vocative/Sent) | Mean of vocative dependencies per sentence | 27.497 | 0.036 |
| Max (Dep_orphan/Sent) | Maximum of orphan dependencies per sentence | 27.317 | 0.038 |
| Morphology/POS tags | |||
| M (Pron_fst/Sent) | Mean number of first-person pronouns per sentence | 31.079 | 0.013 |
| Max (Pron_snd/Sent) | Maximum number of second-person pronouns per sentence | 29.634 | 0.020 |
| Features related to Word Difficulty | |||
| M (LemmaDiff/Word) | Mean of the difference between the word form and its lemma per word | 40.649 | <0.001 |
| Max (Syllab/Word) | Maximum number of syllables per word | 28.647 | 0.026 |
| Sentence Length | |||
| M (Wd/Sent) | Mean number of words per sentence | 27.774 | 0.033 |
| Key Takeaway: The most significant features are compound dependencies (syntax) and lemma–wordform differences (morphology), both with highly significant p-values (< 0.001). First-person pronouns also show notable significance (p = 0.013). | |||
| Feature | Description | F-Value |
|---|---|---|
| Syntactical Features | ||
| Max (Dep_fixed/Sent) | Maximum of fixed expressions per sentence | 20.36 |
| Max (Dep_ccomp/Sent) | Maximum of clausal complements per sentence | 19.99 |
| Max (Dep_dep/Sent) | Maximum of dependency relations per sentence | 19.97 |
| Max (Dep_cop/Sent) | Maximum of copula dependencies per sentence | 18.36 |
| Max (Dep_nummod/Sent) | Maximum of numerical modifiers per sentence | 16.46 |
| Max (Dep_expl/Sent) | Maximum of expletive dependencies per sentence | 15.50 |
| Max (Dep_xcomp/Sent) | Maximum of open clausal complements per sentence | 12.98 |
| Max (Dep_iobj/Sent) | Maximum of indirect object dependencies per sentence | 12.77 |
| Max (Dep_csubj/Sent) | Maximum of clausal subject dependencies per sentence | 11.14 |
| Max (Dep_compound/Sent) | Maximum of compound dependencies per sentence | 9.77 |
| M (Dep_obl/Par) | Mean of oblique dependencies per paragraph | 3.09 |
| M (Dep_iobj/Par) | Mean of indirect object dependencies per paragraph | 2.75 |
| Morphological Features | ||
| Max (Pron_fst/Sent) | Maximum of first-person pronouns per sentence | 15.26 |
| Max (Pron_thrd/Sent) | Maximum of third-person pronouns per sentence | 15.16 |
| Max (Pron_int/Sent) | Maximum of interrogative pronouns per sentence | 13.68 |
| Max (Pron_snd/Sent) | Maximum of second-person pronouns per sentence | 12.76 |
| Max (POS_pron/Sent) | Maximum of pronouns per sentence | 12.41 |
| Max (Pron_fst/Par) | Maximum of first-person pronouns per paragraph | 8.72 |
| M (Pron_fst/Par) | Mean of first-person pronouns per paragraph | 2.73 |
| Word Complexity | ||
| M (Wd/Sent) | Mean number of words per sentence | 40.58 |
| M (Sent/Par) | Mean number of sentences per paragraph | 26.27 |
| Max (Syllab/Word) | Maximum number of syllables per word | 24.31 |
| Max (Wd/Sent) | Maximum number of words per sentence | 23.93 |
| M (NgramEntr_2/Word) | Mean bigram entropy per word | 21.93 |
| Max (Connector_reasonandpurpose/Sent) | Maximum connectors of reason/purpose per sentence | 16.91 |
| Max (Wd/Par) | Maximum number of words per paragraph | 15.77 |
| Max (NgramEntr_2/Word) | Maximum bigram entropy per word | 14.60 |
| Max (LemmaDiff/Word) | Maximum lemma difference per word | 3.11 |
| Connectors | ||
| Max (Connector_concc/Sent) | Maximum connectors of concession per sentence | 15.10 |
| Max (Connector_contrast/Sent) | Maximum connectors of contrast per sentence | 15.02 |
| Max (Connector_coord/Sent) | Maximum coordinating connectors per sentence | 13.59 |
| M (Connector_link/Sent) | Mean linking connectors per sentence | 3.59 |
| Punctuation and commas | ||
| M (Punct/Par) | Mean punctuation per paragraph | 2.99 |
| M (Commas/Par) | Mean commas per paragraph | 2.71 |
| Key Takeaway: Sentence length features (F = 40.58) and syntactic complexity (max F = 20.36 for fixed expressions) show the strongest effects. First-person pronouns (F = 15.26) and discourse connectors (max F = 15.10) are particularly significant in chapter-level analysis, suggesting these features play important roles in distinguishing between genres. | ||
| DeepSeek Genre | Llama Genre | Frequency |
|---|---|---|
| Sentimental | Poetic | 150 |
| Sentimental | Psychological | 100 |
| Psychological | Poetic | 71 |
| Sentimental | Social | 61 |
| Psychological | Sentimental | 43 |
| War | Historical | 30 |
| Historical | Social | 24 |
| Psychological | Social | 22 |
| Social | Sentimental | 21 |
| Historical | Outlaw | 21 |
| Model | 1st Genre | 2nd Genre | 3rd Genre |
|---|---|---|---|
| DeepSeek | social (0.5) | psychological (0.3) | biographical (0.2) |
| Llama | sentimental (0.5) | historical (0.3) | exile (0.2) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Udrea, A.C.; Ruseti, S.; Pojoga, V.; Baghiu, S.; Terian, A.; Dascalu, M. Identifying Literary Microgenres and Writing Style Differences in Romanian Novels with ReaderBench and Large Language Models. Future Internet 2025, 17, 397. https://doi.org/10.3390/fi17090397
Udrea AC, Ruseti S, Pojoga V, Baghiu S, Terian A, Dascalu M. Identifying Literary Microgenres and Writing Style Differences in Romanian Novels with ReaderBench and Large Language Models. Future Internet. 2025; 17(9):397. https://doi.org/10.3390/fi17090397
Chicago/Turabian StyleUdrea, Aura Cristina, Stefan Ruseti, Vlad Pojoga, Stefan Baghiu, Andrei Terian, and Mihai Dascalu. 2025. "Identifying Literary Microgenres and Writing Style Differences in Romanian Novels with ReaderBench and Large Language Models" Future Internet 17, no. 9: 397. https://doi.org/10.3390/fi17090397
APA StyleUdrea, A. C., Ruseti, S., Pojoga, V., Baghiu, S., Terian, A., & Dascalu, M. (2025). Identifying Literary Microgenres and Writing Style Differences in Romanian Novels with ReaderBench and Large Language Models. Future Internet, 17(9), 397. https://doi.org/10.3390/fi17090397


