Kafka’s Literary Style: A Mixed-Method Approach

Strathausen, Carsten; Shang, Wenyi; Kazakov, Andrei

doi:10.3390/h14030061

Open AccessArticle

Kafka’s Literary Style: A Mixed-Method Approach

by

Carsten Strathausen

^1,2,*

,

Wenyi Shang

³

and

Andrei Kazakov

¹

Department of English, College of Arts and Science, University of Missouri, Columbia, MO 65211, USA

²

School of Languages, Literatures, and Cultures, College of Arts and Science, University of Missouri, Columbia, MO 65211, USA

³

School of Information Science & Learning Technologies, College of Education & Human Development, University of Missouri, Columbia, MO 65211, USA

^*

Author to whom correspondence should be addressed.

Humanities 2025, 14(3), 61; https://doi.org/10.3390/h14030061

Submission received: 9 January 2025 / Revised: 27 February 2025 / Accepted: 3 March 2025 / Published: 12 March 2025

(This article belongs to the Special Issue Franz Kafka in the Age of Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

In this essay, we examine how the polyvalence of meaning in Kafka’s texts is engineered both semantically (on the narrative level) and syntactically (on the linguistic level), and we ask whether a computational approach can shed new light on the long-standing debate about the major characteristics of Kafka’s literary style. A mixed-method approach means that we seek out points of connection that interlink traditional humanist (i.e., interpretative) and computational (i.e., quantitative) methods of investigation. Following the introduction, the second section of our article provides a critical overview of the existing scholarship from both a humanist and a computational perspective. We argue that the main methodological difference between traditional humanist and AI-enhanced computational studies of Kafka’s literary style lies not in the use of statistics but in the new interpretative possibilities enabled by AI methods to explore stylistic features beyond the scope of human comprehension. In the third and fourth sections of our article, we will introduce our own stylometric approach to Kafka, detail our methods, and interpret our findings. Rather than focusing on training an AI model capable of accurately attributing authorship to Kafka, we examine whether AI could help us detect significant stylistic differences between the writing Kafka himself published during his lifetime (Kafka Core) and his posthumous writings edited and published by Max Brod.

Keywords:

Kafka; Brod; literary style; modernism; digital humanities; computation

1. Introduction

Both the 20th and the 21st centuries have been heralded as “Kafka’s Century”1. Franz Kafka, indeed, is one of the most popular authors of all time. His work has been translated into more than 45 different languages, and his writings have been adapted into different media and formats such as film, theater, graphics, art, radio, audiobooks, anime, comics, opera, dance, videogames, and merchandise of all kinds. The OCLC record alone lists ca. 30,000 primary works of adaptation related to Kafka’s work. The amount of commentary and secondary literature can no longer be quantified.

There are many ways to explain Kafka’s popularity. From a structuralist–institutional perspective (Danto 1996; Moretti 2005), his success is largely the product of a virulent “Kafka industry” that was inaugurated by Max Brod and intensified over decades by a vast network of self-interested publishers, scholars, translators, and institutions who promoted Kafka’s work methodically across the globe (Strathausen 2024a). From an aesthetic perspective, on the other hand, Kafka’s success simply reflects the unique quality of his fantastic stories and singular literary style. “In Kafka studies, the most basic tenet on Kafka’s style is that it is not only unique, but even ‘solitary’: Scholars converge in finding that among his generation, nobody else writes like him” (Herrmann 2017, para. 27). These two explanations are not mutually exclusive, and critics frequently endorse both (Kundera 1991; Harman 1996)2. But the point is that even critics who are highly suspicious of the Kafka industry readily agree that Kafka’s literary style is somehow distinct and unique.

Yet the labels they use to identify his style remain as contradictory today as ever. Jean-Paul Sartre regarded Kafka as an Idealist (Sartre 1999), Thomas Mann saw him as a Realist (Mann 1949), Albert Camus as a Surrealist (Camus 1942), and Claude Gandelman as an Expressionist (Gandelman 1974). Such labels aside, we find a similar disagreement about which rhetorical figure best characterizes Kafka’s writing: Is his literary style essentially symbolic (Beißner 1952), metaphorical (Emrich 1965), allegorical (Brod 1926; Thompson 2016), or parabolic and paradoxical (Politzer 1962; Buchholz 2018)?3 There is no agreement about the major cultural or linguistic influences on Kafka’s writing either. Contemporaries like Tucholsky and Uzedil insisted that Kafka was a quintessential “German author” who wrote “the best and most pure German” there is (qtd. in Thieberger 1979a, 180), whereas Thieberger noted the strong presence of Austrian colloquialisms and regional idioms in Kafka’s texts (Thieberger 1979a, 183f.), and Max Brod insisted that Kafka was a genuine Jewish author influenced by the kabbala and Jewish mysticism (Brod 1926, 500f.; Fromm 2010, 429f.). Since then, others have emphasized the importance of administrative and juridical parlance (Hermsdorf 1984) and the seeping influence of the Czech language into his prose (Woods 2013). For Deleuze and Guattari, finally, Kafka writes “a minor literature” that escapes any such labels and classifications altogether (Deleuze and Guattari 1986).

These long-standing debates are symptomatic of the hermeneutic density of Kafka’s writing, which provides ample evidence in support of vastly different, and often contradictory, interpretations of the text. The inevitability of such contradictions, in fact, has become a recurrent theme in Kafka scholarship precisely because it mirrors and highlights the inherent ambiguity of Kafka’s texts (Koelb 2006; Jahraus 2008). For more than 50 years, reader-oriented and narratological studies in particular have emphasized the ambiguity and indeterminacy of textual meaning in Kafka (Troscianko 2018; Kobs 1970; Walser 1968; Wagner 2011)4. “Ambiguity,” Ritchie Robertson states categorically, “is a major characteristic of Kafka’s texts” (Robertson 2018, p. 65). We know that a major reason for Kafka’s many self-corrections during the writing process was precisely “to avoid simple solutions or unambiguity” in his texts (Engel 2018, p. 58). The result is a profound hermeneutic dilemma for Kafka’s readers, who must reconcile in their minds the opposing principles of “Deutungsprovokation und Deutungsverweigerung” that characterize Kafka’s texts (Engel 2010, p. 411): “Every sentence invites interpretation,” Adorno quipped, “yet none will yield to it” (Adorno 1997, p. 267)5. Kobs, Michael. rsten. BioAesthetics. Making Sense of Life in Science and the Arts (Minneapolis: University of Minnesota Press, p. 2).

In this essay, we examine how this polyvalence of meaning in Kafka’s texts is engineered both semantically (on the narrative level) and syntactically (on the linguistic level), and we ask whether a computational approach can shed new light on this long-standing debate about the major characteristics of Kafka’s literary style. A mixed-method approach means that we seek out points of connection that interlink traditional humanist (i.e., interpretative) and computational (i.e., quantitative) methods of investigation. In the second section of our article, we will provide a critical overview of the existing scholarship from both a humanist and a computational perspective. We argue that the main methodological difference between traditional humanist and AI-enhanced computational studies of Kafka’s literary style lies not in the use of statistics but in the new interpretative possibilities enabled by AI methods to explore stylistic features beyond the scope of human comprehension. For many computational stylometry studies, the goal is to design a program that can identify, verify, or profile the author of any given text with high accuracy. Whether the data behind this prediction are interpretable by humans or not is of secondary importance as long as the prediction is correct. That is not the objective in literary analysis, however, because here, the focus rests entirely on the interpretability of the data to help interpret the text.

In the third and fourth sections of our article, we will introduce our own stylometric approach to Kafka, detail our methods, and interpret our findings. Rather than focusing on training an AI model capable of accurately attributing authorship to Kafka, we wanted to know whether AI could help us detect significant stylistic differences between the writing Kafka himself published during his lifetime (Kafka Core) and his posthumous writings edited and published by Max Brod. Through this, we aim to provide fresh insights into the long-standing scholarly debate on Kafka’s literary style.

2. Kafka, Hermeneutics, and Quantification

Quantitative approaches to Kafka’s stylistic patterns date back to the middle of the last century. In his dissertation entitled Die Sprache Kafkas. Eine semiotische Untersuchung, M. Gerhardt used part-of-speech analysis in Kafka’s texts to identify two linguistic traits that separate his early from his later writing: adjectives outnumber adverbs, and nouns outnumber verbs in texts written before 1910, but not thereafter. In texts written after 1910, both adjectives and adverbs, and nouns and verbs, share the same frequency, respectively (Gerhardt 1969, p. 27). Since then, there have been many related attempts to use the frequency patterns of (groups of) words in Kafka’s texts to characterize his style. Binder (1966) notes the frequency of “Als-ob” phrases throughout Kafka’s writings (201), while Thieberger emphasizes the many “Wenn–Dann” constructions as well as frequent repetitions of words and phrases, the use of series (Reihung) by piling up attributes, and the frequent use of parenthesis (Thieberger 1979a, 188f.).

The Kafka Konkordanz (1993/2003) continued this tradition on a much larger scale. The two volumes of the Konkordanz provide detailed statistical information about total word count, word frequency, and frequency of major parts of speech (e.g., nouns, adjectives, conjunctions, etc.) within and across Kafka’s three novels and his Nachgelassene Schriften (according to the Kritische Kafka Ausgabe [KKA]). To refine their data, the editors reduced each word (token) in Kafka’s text to its base form or type (e.g., sehe to sehen or Herrn to Herr) and then calculated the type/token ratio for each individual corpus. This token/type relation provides critical information about the lexical variety of a text. For example, the Konkordanz documents that the most common tokens across all three Kafka novels are verbs (ca. 21% of all words in the novels), followed by nouns (ca. 17%), adverbs (ca. 8.5%), and adjectives (ca. 7%). Once tokens are reduced to types, however, the most common ones are nouns (ca. 44% of all distinct word stems in the novels are nouns), followed by verbs (ca. 33.5%), adjectives (ca. 15%), and adverbs (ca. 5%). This type/token ratio tells us that although Kafka uses more verbs than nouns throughout his novels, the lexical variety of the nouns he uses is significantly higher than that of the verbs. In other words, Kafka uses the same verbs far more frequently than he uses the same nouns. With regard to specific word frequencies, the Konkordanz shows that the most common verbs (sein, haben, sagen, werden, können, wollen) and the most common adjectives (viel, groß, gut, gleich, wenig, klein) are fairly stable across all four corpora, whereas the most common nouns are more varied, meaning they differ from one another more than do adjectives and verbs.

A major limitation of the statistical data in the Konkordanz is the lack of an external comparison—that is, another author or set of authors—to render the data interpretable and meaningful for the debate about Kafka’s literary style. To be sure, the Konkordanz does provide comparative statistical data of different works within Kafka’s oeuvre, a line of inquiry that is continued in recent computational studies (Salgaro 2023). But the Konkordanz does not combine its four corpora into a single “Kafka corpus” with its own metadata and then compare them to the metadata of other German modernist writers around 19006. Recent computational analyses of Kafka’s style have done precisely that. Berenike Herrmann, for example, compared the stylistic features of Kafka’s texts to that of 20 other German authors around 19007. Her parts-of-speech analysis showed “a high frequency of lemmas that may perform ‘modal’ functions in the discourse” (Herrmann 2017, para. 42). This makes sense, since the function of modal words (e.g., ja, doch, nur, bestimmt, etc.) in discourse is to convey the speaker’s mood or attitude vis-à-vis the propositional content of the speech, the ambivalence of which is a major characteristic of Kafka’s narrative voice. Two other computational studies by Matt Erlin confirm the relative frequency of modal words in Kafka’s text (Erlin 2017; Erlin et al. 2023). We return to this point in the next section when we discuss our own computational analyses of Kafka’s text.

For now, we want to emphasize that Herrmann’s analysis also showed that “Kafka’s texts indeed form a cluster, which means that they are more similar to each other than to any other text in the study,” and that Kafka appears somewhat “removed from the rest of the corpus” (Herrmann 2017, para. 34). These results strengthen the singularity thesis about Kafka’s unique literary style. But they come at a price. Whereas the Konkordanz still enumerated the exact word frequencies and type/token ratios it found across Kafka’s novels, Herrmann’s study employs computational tools that combine hundreds of such features into a multi-dimensional vector space that ultimately gets visualized as a graph or map in two-dimensional space to aid human interpretation of the data. This approach enables a more comprehensive comparison of literary styles between different authors.

Hence the sheer computational prowess of today’s natural language processing tools, made possible by the recent advancements in AI technology, ushers in a new and distinctive era of statistical research on Kafka’s style. In methodological terms, there is no difference whether we count Kafka’s words by hand, as scholars did in the 1960s, or by machine, as we do today. But there is a huge difference between statistical results based on simple (and easily interpretable) parameters like word frequency in comparison to complex statistical results based on hundreds of stylistic parameters spread across multi-dimensional vector spaces that machines can read but humans cannot. Herrmann is cognizant of the problem and shifts her analysis back to a qualitative close reading of how modal particles function in some of Kafka’s texts. This is a mixed-method approach that combines hermeneutic criticism with statistical analyses, and our own approach follows the same model.

The key point we wanted to make in this section is that this mixed-method approach has a long history in Kafka studies and that the main distinction between humanist and computational analyses of literary style is not the use of statistics or scientific methods per se because everybody uses them to some degree in different ways. The main difference concerns the use of AI models that transform literary texts into features through sophisticated processes, capturing stylistic characteristics that reflect quantitative patterns of traits that we, human readers, cannot identify or interpret anymore.

There is a related distinction we want to introduce before we move on to our own computational studies of Kafka’s literary style. On the one hand, Kafka’s vocabulary can be described in positive scientific terms. This includes statistics about word frequency, parts-of-speech distribution, punctuation frequency, type/token analysis, etc. It also includes data about the relative frequency of modal particles and subjunctive I in Kafka’s texts as compared to those by other authors or data about the increasing dominance of direct speech in Kafka’s later works that culminate in The Castle (ca. 70% of the novel in the KKA consists of direct speech). Data of this kind is both quantifiable and indisputable (as opposed to the interpretation of this data, which certainly is disputable). It is scientifically robust but hermeneutically weak. To claim that the frequent use of modal words in Kafka’s texts contributes to (or correlates with, to use a more modest claim) the semantic ambiguity of Kafka’s texts is like claiming that alliteration, rhyme, and meter serve to increase the musical quality of pre-modernist poetry. It is a good insight but not an interpretation. This explains why computational studies like Erlin’s and Herrmann’s merely use these kinds of data as a starting point for more targeted investigations about the complexity of Kafka’s style.

We distinguish this group of linguistic traits from a second group of stylistic traits that are less static and more dynamic, which makes them harder to quantify. For example, critics frequently emphasize the self-reflexivity of Kafka’s texts, their dynamic tension between form and content, and the predominance of free–indirect discourse as characteristic of Kafka’s style8. We cannot discuss these traits in detail, so we comment only on the dynamic tension between the linguistic form and the propositional content of Kafka’s writing. When critics describe the structural principle of Kafka’s prose as a “paradox circle” (Kobs 1970, p. 14) or a “sliding paradox” (Neumann 1968), they highlight a peculiar tension in his writing, namely that it constructs meaning at the linguistic level while, at the same time, it undermines that meaning at the rhetorical or narrative level. Kafka’s texts, in other words, juxtapose the obtuse meaning of what is being said with the linguistic clarity of how it is said.9 What remains is “a narrative that doubles itself and allows what is told to be subverted by the very act of telling” (Neumann 2011, p. 89; our translation). It is important to note that this dynamic tension primarily unfolds at the level of narration, not the plot level. It is not the argumentative back and forth between quarreling characters that undermines the credibility of linguistic signification in Kafka’s novels. Rather, narrative language succumbs to the fecundity and incongruence of meaning produced at the linguistic, rhetorical, and propositional levels of the text that cannot be synthesized into a coherent whole.

The distinction between these two groups—between positive linguistic traits and dynamic stylistic traits—carries significant implications for a computational approach to Kafka’s style. The dynamic nature of stylistic traits makes it difficult to quantify them into discrete, analytical values, which are necessary for computational approaches to operate. Counting metaphors (or paradoxes or self-reflexivity) is not as easy as counting nouns. Metaphors are more versatile, and they cannot easily be reduced to a stable linguistic core the way nouns can. A stem word like “Hand” appears hundreds of times in Kafka’s novels, but assumes only three codified forms (Hand, Hände, Händen). An expression like “die Hand heben/geben/reichen/fassen/streicheln” also appears hundreds of times yet exhibits a wide variety of different forms that may, or may not, function as metaphors depending on context.

All stylistic traits in the second group (e.g., narrative voice, self-reflexivity, form–content dynamics, etc.) are difficult to classify linguistically because they are structurally complex and activate different levels of meaning within the text. Yet their complexity makes these traits highly productive in hermeneutic terms, meaning they increase the semantic ambiguity that characterizes Kafka’s texts overall. We agree with Drucker (2012) and others that the future role of digital humanities in literary studies might depend in part on its ability to address this complexity problem and to offer adequate computational tools that can reasonably model the dynamic interplay of meaning production in literary text in ways that enable humanist interpretation of the data.

In our own computational studies described below, we have sought to identify and examine traits of both kinds; that is, we studied some dynamic stylistic traits and some positive linguistic traits that characterize Kafka’s texts. Through this dual approach, our article seeks to leverage the computational power of these novel AI models to gain an understanding of Kafka’s style from an unprecedented perspective while minimizing the limitations of the AI “black box” by closely examining the results through the lens of hermeneutic criticism.

3. Stylistic Comparison of Kafka vs. Brod

In our study, we created and used three corpora of texts: (a) all literary writings that Kafka himself published during his lifetime (Kafka Core); (b) Max Brod’s second edition of Kafka’s Gesammelte Schriften (1946–47) (Kafka 1946); and (c) a selection of novels and fiction written by Brod between 1909 and 1935. What distinguishes our study from previous quantitative studies is that we split Kafka’s oeuvre into two groups (a and b), which allows us to compare texts edited and published by Kafka himself with texts edited and published posthumously by Brod.

We consider this distinction relevant and meaningful. Let us recall that many changes Brod made in his edition of Kafka’s Gesammelte Schriften had less to do with re-writing Kafka’s handwritten manuscripts than with re-arranging them. The Franz Kafka Ausgabe (FKA), a facsimile edition of Kafka’s manuscripts (1992–2018), revealed that Brod created coherent narratives from literary fragments and that many of Brod’s choices reflect his own personal preferences rather than common editorial practice10. Hence, Roland Reuß, the editor of the FKA, claimed that the novels commonly known as Amerika, The Trial, and The Castle were authored by Max Brod, not Franz Kafka. “Kafka did not leave behind any novels” (Reuß 1995, p. 21). Reuß’ wording is precise. Kafka wrote the material, sure, but he did not author any novels. The authorship belongs to Brod and his edition of Kafka’s texts.

To test this claim, we first investigated stylistic traits across the three corpora by tasking AI with developing its own set of stylistic criteria to distinguish them. We trained two distinct AI models to classify Kafka Core and Brod’s own literary corpus. Next, we tasked each model with assigning authorship of specific sections of text from Kafka’s Gesammelte Schriften to either Kafka or Brod based on the stylistic patterns identified earlier. Our aim was to determine whether AI models would classify sections of Brod’s edition as stylistically closer to Brod’s own writing than to Kafka’s. Hence, our experiment differs from the typical case in stylometry research. The standard goal of using machine learning in stylometry is authorship attribution, which can be outlined as “Given a set of texts with known authorship, can we determine the author of a new unseen document?” (Savoy 2020, p. 9). That is not our question. We already know exactly how Kafka’s original manuscripts were altered by Brod in his Gesammelte Schriften. Both the KKA and the FKA provide detailed evidence of these changes, so there is no mystery here to solve. Instead, our focus is on examining the stylistic characteristics of Brod’s edition by analyzing its stylistic alignment with Kafka Core and Brod’s own works through AI-based analysis.

We used deep learning models in our classification experiments because they mitigate the limitations of hand-crafted features (e.g., word frequency, syntactic features, sentence length), which are often ineffective at capturing subtle stylistic traits. Deep learning models, particularly attention-based architectures like BERT (Bidirectional Encoder Representations from Transformers), excel at capturing contextual information, making them well-suited for identifying stylistic nuances based on word usage, syntax, and grammar. This capability is crucial for detecting subtle variations in style, such as those introduced when a text is edited by another author. Deep learning models also handle fragmented text effectively, analyzing individual passages while maintaining a broader understanding of the entire document. Additionally, in real-world scenarios, authors’ styles are rarely perfectly distinct, as texts often exhibit a mixture of styles and inherent noise. Deep learning models, when trained on sufficient data, are more robust against such noise and can detect subtle stylistic shifts.

After preprocessing our three corpora (Kafka Core, Brod’s corpus, and Brod’s edition of Kafka), we divided each corpus into chunked passages of approximately 400 words using a BERT tokenizer and generated embeddings (numerical representations of text capturing semantic meaning) for each chunked passage with a pretrained German BERT model, available at https://huggingface.co/dbmdz/bert-base-german-cased (accessed on 27 February 2024). To improve the robustness of our experiments, we employed two deep learning architectures that achieved state-of-the-art (SOTA) results: BERT fine-tuning and Bi-LSTM (Bidirectional Long Short-Term Memory) Q-Learning. Both architectures were trained on the embeddings of the chunked passages of Kafka Core and Brod’s corpus to learn their stylistic differences. The models then made independent predictions on each chunked passage of Brod’s edition of Kafka, assigning either Kafka or Brod as the author. Since all the texts were originally written by Kafka (though edited by Brod), the correct attribution in each case was Kafka. Figure 1 shows the accuracy results for both models.

For the technical details of the classification experiment, please refer to Appendix A.

Among the 1545 chunked passages, 169 of these passages were misidentified by Bi-LSTM Q-Learning and 98 by the BERT classifier as Brod’s work. From a computational perspective, it is noteworthy that BERT consistently outperforms Bi-LSTM Q-Learning, which is understandable given that BERT employs a more advanced architecture compared to LSTM. In hermeneutic terms, it is remarkable how well, overall, both methods performed. The overall accuracy for Bi-LSTM Q-Learning was 89%, and that for BERT was 94%. We do not know and cannot interpret the thousands of criteria AI uses to calculate the statistical probability that a specific segment of text was written by Kafka, not Brod. We only know the result, namely that BERT correctly identified 94% of (thousands of) word segments in Brod’s edition of Kafka as written by Kafka, not Brod.

We were particularly interested in the remaining 6% of cases. Our hypothesis was that AI would primarily misidentify those text segments in Brod’s edition of Kafka where Brod’s editorial interventions are more evident than elsewhere. We used the KKA and the FKA editions to identify those passages that AI falsely associated with Brod, and we checked if and how these passages in Kafka’s manuscripts had been altered by Brod in his edition of Kafka’s Gesammelte Schriften. But we could not confirm our hypothesis. Both AI programs frequently flagged text segments in Brod’s edition of Kafka that showed little or no apparent editorial interventions by Brod. We also identified a paragraph in Brod’s second edition of Kafka’s Gesammelte Schriften that literary scholars have cited as evidence for Brod’s editorial meddling in the original manuscripts. In the opening paragraph of The Castle, Brod changed the tempus of one verb from present tense to past tense, which has significant implications for the narrative voice and readers’ understanding of the text (Strathausen 2024b). We checked whether our AI models had flagged this passage or not. Since only the BERT model provides a certainty level when assigning a passage to an author, we focused on its result, which showed that the BERT model identified Kafka’s authorship of this passage with 99.998% certainty. This result reminds us that a simple temporal change from presence to past tense in a literary text is not significant in stylometric terms yet can be very significant in hermeneutic terms with profound narratological implications for how to read and understand the text.

Finally, we wondered why both BERT and Bi-LSTM Q-Learning were more successful in identifying text segments from Kafka’s Amerika compared to his other two novels and the remaining prose. Again, a cursory comparison of Brod’s edition of Kafka with the KKA does not reveal any apparent differences in Brod’s treatment of Amerika as opposed to other texts. As he had done previously for Der Prozess [The Trial] (Kafka 1925) and Das Schloss [The Castle] (Kafka 1926), Brod’s 1927 edition of Der Verschollene [Amerika] (Kafka 1927) too, arbitrarily classified long sections of Kafka’s handwritten text as fragments and relegated them to the appendix. If Brod’s interference, or lack thereof, does not explain why AI identified Kafka’s style more easily in Amerika than in his other texts, we must conclude that Kafka’s style in Amerika is somehow more pronounced—which is to say, more accurately quantifiable—than elsewhere. A possible reason might be that Kafka wrote most of Amerika in a relatively short period (from September 1912 until January 1913), right after he had written “The Judgment,” a key text that signaled his literary breakthrough and that Kafka himself considered one of his best, and most authentic, works11. Kafka was equally proud of the first chapter of Amerika, which he published separately with Kurt Wolff under the title “Der Heizer. Ein Fragment” in Spring 1913. It is rare for Kafka to embrace his works so unconditionally. Does Kafka’s satisfaction with these texts point to a particular literary style that emerged during the short period from September 1912 to January 1913 and that AI is able to identify but Kafka was not able to replicate in his later writings? Such speculations are intriguing, but trying to substantiate them would require a lot more computational research with more granular corpora (e.g., single chapters) from all three novels and other writings by Kafka.

4. Linguistic Comparison of Kafka vs. Brod

To complement the AI-based approach for the identification of stylistic traits, we also analyzed three specific linguistic traits in each corpus, namely sentence length, the frequency of punctuation signs, and the frequency of subjunctive I. For the first two traits, our hypothesis was this: to the degree that Brod’s edition culled together disparate, decontextualized fragments into coherent literary texts, the formal appearance and inner structure of these texts were significantly shaped by Brod, not just Kafka. We wanted to see if sentence and punctuation statistics bore this out. As for subjunctive I—a mode of speech that introduces ambiguity about whether the reported statements are true and thus contributes directly to several stylistic traits of Kafka’s prose (e.g., narrative voice, form–content dynamics, contradiction, etc.)—our hypothesis was that Kafka Core would have a higher relative frequency of subjunctive I than both Brod’s own literary corpus and Brod’s edition of Kafka’s works.

We used the natural language processing tool SpaCy (https://spacy.io, accessed on 27 February 2024) and the pretrained “de_core_news_lg” model to calculate total words, sentences, punctuation, and subjunctive I occurrences. This process involved dependency parsing, lemmatization, and part-of-speech tagging. The main challenge was accurately identifying subjunctive I forms in German; our identification approach is detailed in Appendix B.

Table 1 shows that Kafka Core has a larger value in words per sentence and a smaller value in punctuation per word compared to Brod’s corpus. The values of Brod’s edition of Kafka stay in between but are much closer to Kafka Core than to Brod’s corpus. Brod’s literary works, in other words, feature shorter sentences with more punctuation signs than Kafka’s across both corpora (Kafka Core and Brod’s edition of Kafka). One interpretation of these statistics is that Brod’s emendations of Kafka’s unpublished manuscripts resulted in shorter sentences and increased use of punctuation signs compared to Kafka Core. Brod’s edition, so to speak, “pulled” Kafka’s texts towards Brod’s own style of writing (shorter sentences, more punctuation). That part of the data is consistent with our hypothesis.

The results for subjunctive I, however, were quite unexpected. Kafka Core has the smallest value (least frequency of subjunctive I), quite close to Brod’s corpus, while Brod’s edition of Kafka has a much higher value. It is difficult to interpret this result. We know that Brod occasionally changed tense and mood in Kafka’s sentences, but these relatively few instances cannot account for the significant value difference in subjunctive I between Kafka Core and Brod’s edition of Kafka. Moreover, both corpora include texts that were composed during different writing periods in Kafka’s life, that is, the early period (1904–1912), middle period (1913–1919), and late period (1920–1924), so we cannot attribute the difference in subjunctive I frequency to the different writing periods in Kafka’s life either. One possible explanation might be genre. The literary texts in Kafka Core are all (relatively short) stories, whereas the literary texts in both Brod’s own corpus and Brod’s edition of Kafka are dominated by (relatively long) novels.

Therefore, the value difference between the two Kafka corpora might reflect a genre-specific preference of his for the use of subjunctive I in novels as opposed to shorter prose pieces. The use of subjunctive I in German is directly tied to indirect speech, that is, to speech that reports what others have said in a neutral manner without assessing the veracity of the reported statement. Kafka’s longest novel, The Castle, consists mostly of characters’ direct speech, but this speech often contains long reports about alleged statements made by other characters expressed in subjunctive I. Hence, the frequency of direct speech in Kafka’s texts might be tied to—or correlate with—the frequency of indirect speech in the same text. Our metrics support this hypothesis. We found that The Castle, which has the most direct speech by far among all of Kafka’s works, has a much higher average use of subjunctive I per word (0.0058) compared to Amerika (0.0033) and The Trial (0.0048), and all three novels have a much higher average of subjunctive I per word than Kafka’s shorter stories (0.0024). One explanation is that Kafka’s longer texts, i.e., the novels, include more direct speech and dialogue, and these dialogues include portions of indirect speech that require the use of subjunctive I in German. Though true for Kafka, this finding can hardly be generalized. The fact that Kafka’s novels have a higher percentage of direct speech than his other prose (which correlates with a higher percentage of indirect speech and higher frequency of subjunctive I) does not mean that novels by other authors do, too. Trying to determine whether or not genre can be linked directly to different frequency rates of direct speech or subjunctive I in literary texts would require historical and genre-specific parameters of what we mean by “novel” or “literary form” in the first place, and it would require extensive computational research and large amounts of data from numerous corpora.

With this caveat in mind, we also found the more detailed statistics, as visualized in the boxplots of Figure 2 below, helpful to characterize the literary style in Kafka Core.

The boxplots visualize the values of each individual work in the three corpora. If we just focus on the green boxplot (representing Kafka Core) in the left chart (words per sentence), it tells us that a literary text published during Kafka’s own lifetime ranges from 9 to 34 words per sentence at the extremes and from 15 to 23 words per sentence in the green box representing “interquartile range,” which includes the middle 50% of data—spanning from the 25% smallest data points to the 75% smallest. The middle chart shows a similarly wide variation in the punctuation per word in Kafka Core, which ranges from 0.12 to 0.25 at the extremes and between 0.17 and 0.20 in the green box. In other words, Kafka’s texts published during his lifetime vary significantly with regard to sentence length and punctuation use.

As we said earlier, this variation cannot be correlated with different writing periods in Kafka’s life because it occurs across periods and within each period. There is even significant variation of these metrics among stories published in the same collection (like Betrachtung from 1912 (Kafka 1912) or Ein Landarzt from 1919)13. The most intuitive explanation for this variation is, once again, literary genres. Kafka experimented with different forms of literary writing throughout his life, and stylistic variations in sentence length and punctuation are surely part of this experimentation. The fact that we find far less variation in Brod’s edition of Kafka might be due to the predominance of long novels in that corpus and in Brod’s corpus. To test this further, we would have to compare these metrics in Kafka Core to those of other German literary writers around 1900 (not just Brod). As of now, none of our results contradict, and some of our results support our initial hypothesis that Brod’s emendations of Kafka’s unpublished manuscripts created a literary style closer to his own than Kafka’s.

5. Conclusions

There is a long-standing tradition in literary criticism to read Kafka’s stories both as a testimony to the ills of modernity (e.g., excessive bureaucracy, economic struggle, social alienation) and as a prophecy of worse times yet to come (e.g., fascism, Holocaust, WWII) (Arendt [1944] 2019, 179f.). To honor this tradition would be to regard AI as yet another existential threat to human dignity and self-determination that Kafka’s work anticipated more than a century ago. “It is, after all, extremely tormenting to be governed by laws one does not know,” the narrator in “The Problem of our Laws” tells us (Kafka, “Zur Frage der Gesetze” 147; our translation). Is this not true of our current situation in the age of AI, which is able to predict with astounding accuracy our personal likes and dislikes, our chances of health or disease, and the particularities of Kafka’s literary style based on statistical laws that we do not, and cannot, know in detail? According to Kissinger et al. (2022), the age of AI is revolutionary not because it is a versatile tool for getting information but because it uncovers an entirely new ontology, a new way of being in the world that is based on statical probabilities in mathematical space and no longer on those fixed entities in physical space that continue to define our lives and our phenomenological experience of the world.

We remain agnostic on the prophetic qualities of Kafka’s texts. But we agree his work describes a chasm between our familiar human experience of the world and the unknown laws and rules that govern this world. The goal of the humanities, we believe, should be to bridge this chasm by confronting our embodied reality of lived experience with the abstract reality we know solely through mathematics. Our mixed-method approach to Kafka’s literary style followed this mandate. With the help of previous scholarship, we defined semantic ambiguity as a major characteristic of Kafka’s writing, and we used a series of computational studies to explore stylistic and linguistic traits that activate different levels of meaning and signification in his texts. We hope more literary scholars will pursue this kind of mixed-method approach in the future. The overall sophistication and computational capabilities of AI will keep growing exponentially in the next few years, and the new ontology of mathematical probabilities it defines will put increasing pressure on humanists to supplement historical hermeneutics and methods of close reading with distant reading and computational methods, as Drucker (2012) and others have argued. Doing so is a matter of choice today, one that requires justification as to why mixed methods are beneficial in literary studies. In 10–15 years, the tables will have turned, and literary scholars will have to provide justification for not using AI and for not building bridges across the two cultures of art and science.

Author Contributions

C.S. supervised the project, wrote sections I, II, and V. He also wrote parts of sections III and IV and interpreted the data. W.S. contributed to the conceptualization of the project and design of the research methods, where he validated the experimental results, performed follow-up formal analysis, and visualized the data. He also contributed to the writing and reviewing of the manuscript, where he was primarily responsible for Sections III, IV, and the Appendices. A.K. contributed to designing and developing the machine learning-based methods (ML) to test the article’s proposed approach. This includes two separate architectures featuring the BERT classifier and Bi-LSTM encoder Q-learning for authorship classification. The scope of ML-based methods included design, development integration, and experimental execution. Additionally, Kazakov designed and implemented a “subjunctive-1” parsing algorithm used in this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Technical Details on the Classification Experiment

We used the natural language tool SpaCy to preprocess the three corpora (Kafka Core, Brod’s corpus, and Brod’s edition of Kafka), removing unnecessary punctuation and numerical data (e.g., chapter numbers, page numbers, and non-textual symbols such as “()”, “*”, and “[]”). Additionally, we manually trimmed irrelevant subtexts, such as bibliographic data, from the corpus. Each corpus was then divided into chunked passages of 510 tokens using a pretrained BERT tokenizer provided by the MDZ Digital Library team (dbmdz) at the Bavarian State Library (https://huggingface.co/dbmdz/bert-base-german-cased, accessed on 27 February 2024). The BERT tokenizer utilizes the WordPiece algorithm, which processes a word as if it exists in the BERT vocabulary or breaks it into subwords for better handling by neural networks. As a result, tokens, in this case, may represent either entire words or subwords.

Figure A1. Workflow for the classification experiment14.

We opted for the maximum length supported by BERT embeddings (510 tokens plus 2 special tokens: [CLS] and [SEP]) to capture more nuanced stylistic features, such as extra-sentence dependencies, lexical variety, syntactic structure, rhythm, and prosody. The embeddings were then generated by a pretrained German BERT model. For the training set, we used 204 samples (408 total chunked passages) from each Kafka Core and Brod’s corpus and an additional 52 samples (22 from Kafka Core, 30 from Brod’s corpus) for validation. The size of the training and validation sets was constrained by the limited number of literary works published by Kafka during his lifetime and the scarcity of publicly available fiction by Max Brod. To partially address these limitations, we excluded validation samples from the training set, ensuring proper generalization and evaluating model performance on semantically unseen data.

We then applied two deep learning architectures—BERT classifier fine-tuning and Bi-LSTM Q-Learning—both trained on the same pretrained BERT embeddings of the chunked passages from Kafka Core and Brod’s corpus. Our first approach involved fine-tuning a pretrained BERT model for authorship classification. While this approach is similar to BertAA (BERT for Authorship Attribution) (Fabien et al. 2020), we did not incorporate hybrid or stylistic feature vectors during training, as ablation tests showed these features did not significantly improve performance. The model was fine-tuned on the specified training set using the Cross-Entropy loss function, with a configuration of 10 epochs, an input token length of 512, and a learning rate of 0.0001. For classification, we applied a softmax function on the aggregated sequence information from the [CLS] token to convert raw logits into probabilities. The fine-tuned BERT classifier demonstrated excellent performance, achieving 100% accuracy on the validation set.

Our second approach builds on the single Bi-LSTM-based method proposed by Qian et al. (2017), incorporating a reinforcement learning (RL) agent to aggregate the data. In the first stage, we generated contextualized embeddings for each token in the chunked passages from Kafka Core and Brod’s corpus, capturing context based on surrounding tokens. These embeddings were then used to train a Bi-LSTM model to identify passage-level stylistic dependencies and generate corresponding vector representations. The resulting stylistic dependency vectors were used for offline learning by the RL agent. In the second stage, text entries from a document were evaluated to infer stylistic relationships between entities using a cosine similarity function determined by the RL agent. Throughout this process, the RL agent adapted to environmental changes and learned from previous experiences to improve accuracy. Predictions were made based on the similarity of stylistic vectors in the feature space.

Appendix B. Technical Details on the Identification of Subjunctive I

In German, subjunctive I can be identified in two primary ways: (1) Unique forms of the verb “sein” (to be): all forms of this verb (sei, seiest, sei, seien, seiet, seien) are always subjunctive I. (2) Third-person singular subjects paired with the first-person indicative verb form: subjunctive I arises when a third-person singular grammatical subject (e.g., er, sie, es) is combined with a verb form resembling the first-person indicative, typically the verb stem plus the ending “e.” For example, “Ich habe Hunger” (I am hungry) is a first-person singular indicative, while “… er habe Hunger” (He claims to be hungry) is a third-person singular subjunctive I.

We handled the first case by straightforward token matching. The second case, however, required a more complex natural language processing approach. To address this, we predefined sets of third-person singular pronouns and 516 of the most common subjunctive I verb forms, which were based on the list of German verbs at https://verben.org/en/top-500-german-verbs. From that list, we manually removed mistakes, duplicates, and overly technical terms. Next, we used dictionaries and a scholarly database from the Leipzig Corpora Collection (specifically a mixed-source one using data from 2011) to add more verbs to the list, ending up with a total of 516 verbs.

We then defined helper functions to verify grammatical subjects (e.g., compliance with third-person singular) and predicates (e.g., membership in the subjunctive I verb set or “sein” forms). We used SpaCy’s “nlp” object to process each sentence, generating a “Doc” object that includes tokenization, dependency parsing, and other linguistic features. For each individual text, we iterated over tokens to locate subjects and their predicates, identifying them using labels such as “sb” (subject) or “ep” (expletive subject). Once the subject and predicate were located, the helper functions checked for third-person singular compliance and verb membership in the predefined sets. Occasionally, the parser assigned multiple subjects to the same predicate. To avoid duplication, we implemented an auxiliary loop to merge duplicate entries before finalizing the results.

Notes

1	In 1984, the centennial of Kafka’s birthday, the Paris Centre Georges Pompidou curated a big exhibition (Le Siècle de Kafka 1984), and 40 years later, in commemoration of the centennial of Kafka’s death in 2024, the front page of the Times Literary Supplement once again proclaimed “Kafka’s Century.” (Kafka’s Century 2024).
2	See Strathausen (2024a) for a more detailed critique of this tendency.
3	In a sweeping deconstructive gesture, Stanley Corngold has ungited these divergent views arguing that Kafka employs metaphors and other rhetorical figures precisely for the purpose of taking them apart, literally, on the page in both semantic and syntactic terms. What thus comes to the fore in Kafka’s writing, Corngold argues, is the materiality and affective power of language that is otherwise covered up by its meaning-making potential (Corngold 2004). This tension creates the semantic ambiguity for which his texts are known.
4	Already in the 1970s, Karlheinz Fingerhut pointed out scholars’ discombobulation about Kafka’s style evident in oxymoronic formulations like “non-metaphorical metaphors” or “symbolic reality” (Fingerhut 1979, p. 139).
5	Whether this dilemma is intentional on Kafka’s part—“Kafka has taken all conceivable precautions to undermine an exegesis of his texts” (Benjamin 1972, p. 435)—or an unintended consequence of his probing literary style—“Kafka’s dense linguistic structures cannot but produce an uncontrollable amount of meaning” (Kobs 1970, p.14)—seems less important than its overall effect that the secondary literature on Kafka threatens to overwrite the literary texts it tries to elucidate.
6	To be sure, it is neither the function nor the purview of a Kafka Konkordanz to provide comparative analyses of this kind, and it would have been impossible to do given the limited computational resources two or three decades ago.
7	She created an overall corpus “of altogether 74 text samples (about 5.9 million words)” that included “27 texts of different length written by Kafka,” three by Robert Walser, and 64 texts written by other authors (Herrmann 2017, paragraph 42).
8	We recognize that the self-reflexivity of Kafka’s writings remains controversial (Jahraus 2008; Engel 2010, 415f.) and that Engel mentions six main characteristics of Kafka’s style (Engel 2010, 411ff.). We could easily add more.
9	Thieberger, for example, defines “the peculiar character of Kafka’s narrative style” as “the enormous tension between the (ungraspable) meaning of what is being said and the (easily graspable) form of linguistic expression” (Thieberger 1979a, p. 183; our translation). Similarly, Oschmann wrote the following in 2010: “Even more than the early texts, Kafka’s later texts are characterized by the elemental tension that the narrating language almost always refers to sensual concrete phenomena (sinnlich Konkretes), whereas the process of narration radically undermines this immediately sensually given by means of constant self-correction” (Oschmann 2010, p. 446; our translation). Kafka tricks us “into expecting something straightforwardly realist, before then giving us something altogether more complicated” (Troscianko 2018, p. 284).
10	An example is Brod’s inserted fiat “Hier beginnt das Schloß” in the first Schlossheft, an arbitrary decision that ignored several pages of preceding text and condemned them to obscurity in the Apparatus section for more than half a century. Another example is Brod’s equally questionable decision to distinguish “complete” from “incomplete” chapters in Kafka’s manuscript of The Trial or his pasting together of literary texts culled from different contexts or parts of the notebooks. Like many other Kafka scholars, Richard Thieberger speaks of Brod’s “arbitrary editing” that yields “confusing” consequences for Kafka’s readers (Thieberger 1979b, p. 54). In Brod’s first edition of the Gesammelte Schriften from 1936, for example, some texts that are seemingly complete and had been given a title by Kafka himself (“Eine teilweise Erzählung”) were not included at all, while others (“Eisenbahnreise,” “Bei den Toten zu Gast,” and “Neue Lampen”) were included in the Tagebücher und Briefe volume. In the second edition of the Gesammelte Schriften from 1946, these same texts, however, were relegated to the volume Hochzeitsvorbereitungen rather than Betrachtung, where they would actually belong. The third edition also printed some of these texts twice as part of different volumes.
11	After completing “The Judgment” in a single night of writing, Kafka noted in his diary on 23 September 1912: “…nur so kann geschrieben werden, nur in einem solchen Zusammenhang, mit solcher vollständigen Öffnung des Leibes und der Seele” [“…only in this way can writing be done, only with such coherence, with such a complete opening out of the body and the soul”] (Kafka 1999, p. 101).
12	Table 1 presents linguistic trait values computed on the entire corpora, providing a single numeric value per trait for each corpus. In contrast, the boxplots in Figure 2 below are based on values computed for individual works within the corpora, providing multiple numeric values per trait, each corresponding to a specific work. However, independent t-tests (Student 1908) reveal no statistically significant differences (p > 0.05) between any pair of corpora for all three traits. This outcome is likely due to the limited number of data points—only 17 works in Kafka Core, five in Brod’s corpus, and four in Brod’s edition of Kafka. Although the statistical tests did not indicate significant differences, the observed variations among the corpora remain notable, suggesting that further research with datasets with more data points may uncover statistically significant distinctions.
13	Kafka’s later publications, like the four stories published together under the title of Ein Hungerkünstler in 1924, feature longer sentences averaging 23–27 words, as literary critics have suspected since the 1960s. But there are also very early texts, like “Gerspäch mit dem Betrunkenen” from 1909, that feature a relatively high average of 20 words per sentence, and other early texts, like those in Betrachtung from 1912, that only have 13 words per sentence. In the middle period, Kafka published a text like Der Heizer (1913) and Die Verwandlung (1915) with an average sentence length of 21 words, but he also published “Der Kübelreiter” and “Ein Bericht an eine Akademie” (from the collection Ein Landarzt (Kafka 1919)) with only a nine-word average per sentence.
14	In the figure, the large blue blocks represent the major stages of the pipeline, and the vertical blue lines mark key steps. Yellow blocks depict the primary processes within the pipeline. White blocks indicate the outputs, while orange blocks show the components of the model architecture.

References

Adorno, Theodor W. 1997. Aufzeichnungen zu Kafka. In Gesammelte Schriften. Edited by Rolf Tiedemann. Frankfurt: Suhrkamp, vol. 10/1, pp. 254–87. [Google Scholar]
Arendt, Hannah. 2019. Franz Kafka. A Revaluation. In Hannah Arendt. Sechs Essays. Edited by Barbara Hahn. Göttingen: Wallstein, pp. 174–83. First published 1944. [Google Scholar]
Beißner, Friedrich. 1952. Der Erzähler Franz Kafka. Stuttgart: Kohlhammer. [Google Scholar]
Benjamin, Walter. 1972. Franz Kafka. In Gesammelte Schriften. Edited by Rolf Tiedemann and Herrmann Schweppenhäuser. Frankfurt: Suhrkamp, Vol. II/2, pp. 409–38. [Google Scholar]
Binder, Hartmut. 1966. Motiv und Gestaltung bei Franz Kafka. Bonn: Bouvier. [Google Scholar]
Brod, Max. 1926. Nachwort. In Franz Kafka. Das Schloß. Leipzig: Kurt Wolff, pp. 500–4. [Google Scholar]
Buchholz, Paul. 2018. Private Anarchy. Impossible Community and the Outsider’s Monologue in. Chicago: Northwestern University Press. [Google Scholar]
Camus, Albert. 1942. Le Mythe de Sisyphus. Paris: Gallimard. [Google Scholar]
Corngold, Stanley. 2004. Lambent Traces: Franz Kafka. Princeton: Princeton University Press. [Google Scholar]
Danto, Arthur. 1996. Art After the End of Art. Contemporary Art and the Pale of History. Princeton: Princeton University Press. [Google Scholar]
Deleuze, Gilles, and Félix Guattari. 1986. Kafka: Towards a Minor Literature. Translated by Dana Polan. Minneapolis: University of Minnesota Press. [Google Scholar]
Drucker, Johanna. 2012. Humanistic Theory and Digital Scholarship. In Debates in the Digital Humanities. Edited by Matthew K. Gold. Minneapolis: University of Minnesota Press, pp. 85–95. [Google Scholar]
Engel, Manfred. 2010. Kafka lesen. In Kafka Handbuch: Leben, Werk, Wirkung. Edited by Manfred Engel and Bernd Auerochs. Stuttgart: Metzler, pp. 411–27. [Google Scholar]
Engel, Manfred. 2018. Writing. In Franz Kafka in Context. Edited by Carolin Duttlinger. Cambridge: Cambridge University Press, pp. 54–61. [Google Scholar]
Emrich, Wilhelm. 1965. Franz Kafka. Bonn: Athenäum. [Google Scholar]
Erlin, Matt. 2017. Topic Modeling, Epistemology, and the English and German Novel. Cultural Analytics 2. [Google Scholar] [CrossRef] [PubMed]
Erlin, Matt, Douglas Knox, and Stephen Pentecost. 2023. Multi-retranslation and cultural variation. The case of Franz Kafka. Target 35: 215–41. [Google Scholar] [CrossRef]
Fabien, Maël, Esau Villatoro-Tello, Petr Motlicek, and Shantipriya Parida. 2020. BertAA: BERT Fine-Tuning for Authorship Attribution. Paper presented at 17th International Conference on Natural Language Processing (ICON), Patna, India, 18–21 December 2020; Edited by Pushpak Bhattacharyya, Dipti Misra Sharma and Rajeev Sangal. 2020, Patna: Indian Institute of Technology Patna, Patna: NLP Association of India (NLPAI), pp. 127–37. Available online: https://aclanthology.org/2020.icon-main.16 (accessed on 18 April 2024).
Fingerhut, Karlheinz. 1979. Bildlichkeit. In Kafka Handbuch. Edited by Hartmut Binder. Berlin: Kröner, vol. 1, pp. 138–177. [Google Scholar]
Fromm, Waldemar. 2010. Schaffensprozess. In Kafka Handbuch: Leben, Werk, Wirkung. Edited by Manfred Engel and Bernd Auerochs. Stuttgart: Metzler, pp. 427–37. [Google Scholar]
Gandelman, Claude. 1974. Kafka as an Expressionist Draftsman. Neohelicon 2: 237–77. [Google Scholar] [CrossRef]
Gerhardt, Marlis. 1969. Die Sprache Kafka’s: Eine Semiotische Untersuchung. Dissertation, University Stuttgart, Stuttgart, Germany. [Google Scholar]
Harman, Mark. 1996. The Latest from the Kafka Factory. The German Quarterly 69: 63–67. [Google Scholar] [CrossRef]
Hermsdorf, Klaus, ed. 1984. Franz Kafka. Amtliche Schriften. Mit einem Essay von Klaus Hermsdorf. Berlin: Aufbau Verlag. [Google Scholar]
Herrmann, J. Berenike. 2017. In a test bed with Kafka. Introducing a mixed-method approach to digital stylistics. DHQ: Digital Humanities Quarterly 11. Available online: https://www.digitalhumanities.org/dhq/vol/11/4/000341/000341.html (accessed on 27 February 2025).
Jahraus, Oliver. 2008. Kafka und die Literaturtheorie. In Kafka Handbuch: Leben-Werk-Wirkung. Edited by Bettina von Jagow and Oliver Jahraus. Göttingen: Vandenhoeck & Ruprecht, pp. 304–16. [Google Scholar]
Kafka’s Century. 2024. Times Literary Supplement. No. 6322. Available online: https://reader.exacteditions.com/issues/116453/page/1 (accessed on 27 February 2025).
Kafka, Franz. 1912. Betrachtung. Leipzig: Rowohlt. [Google Scholar]
Kafka, Franz. 1919. Ein Landarzt. Leipzig: Kurt Wolff. [Google Scholar]
Kafka, Franz. 1925. Der Process. Edited by Max Brod. Berlin: Die Schmiede. [Google Scholar]
Kafka, Franz. 1926. Das Schloss. Edited by Max Brod. Berlin: Die Schmiede. [Google Scholar]
Kafka, Franz. 1927. DDer Verschollene. Edited by Max Brod. Berlin: Die Schmiede. [Google Scholar]
Kafka, Franz. 1946. Gesammelte Schriften, 2nd ed. Edited by Max Brod. New York: Schocken, 5 vols. [Google Scholar]
Kafka, Franz. 1999. Tagebücher 1912–1914. Vol. 2, Edited by Hans-Gerd Koch. Frankfurt: Fischer. [Google Scholar]
Kissinger, Henry A., Eric Schmidt, and Daniel Huttenlocher. 2022. The Age of AI. And Our Human Future. New York: Back Bay Books. [Google Scholar]
Kobs, Jörgen. 1970. Kafka: Untersuchungen zu Bewußtsein und Sprache Seiner Gestalten. Bad Homburg: Athenäum. [Google Scholar]
Koelb, Clayton. 2006. Will the Real Franz Kafka Please Stand Up? In A Companion to the Works of Franz Kafka. Edited by James Rolleston. Rochester: Camden, pp. 27–33. [Google Scholar]
Kundera, Milan. 1991. Rescuing Kafka from the Kafkologists, Trans. Barbara Wright. Times Literary Supplement 24: 3–5. [Google Scholar]
Le Siècle de Kafka. 1984. Paris: Centre Georges Pompidou.
Mann, Thomas. 1949. Dem Dichter zu Ehren: Franz Kafka und ‘Das Schloss’. Der Monat 2: 66–70. [Google Scholar]
Moretti, Franco. 2005. Graphs, Maps, Trees. Abstract Models for Literary History. London and New York: Verso. [Google Scholar]
Neumann, Gerhard. 1968. Umkehrung und Ablenkung. Franz Kafkas ‘Gleitendes Paradox. Deutsche Vierteljahrsschrift für Literaturwissenschaft und Geistesgeschichte 42: 702–44. [Google Scholar] [CrossRef]
Neumann, Gerhard. 2011. Verfehlte Anfänge und Offenes Ende: Franz Kafkas Poetische Anthropologie. Frankfurt: Siemens Stiftung. [Google Scholar]
Oschmann, Dirk. 2010. Kafka als Erzähler. In Kafka Handbuch: Leben, Werk, Wirkung. Edited by Manfred Engel and Bernd Auerochs. Stuttgart: Metzler, pp. 438–49. [Google Scholar]
Politzer, Heinz. 1962. Franz Kafka. Parable and Paradox. Ithaca: Cornell University Press. [Google Scholar]
Qian, Chen, Tianchang He, and Rao Zhang. 2017. Deep Learning Based Authorship Identification. Stanford University. Available online: https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1174/reports/2760185.pdf (accessed on 17 April 2024).
Reuß, Roland. 1995. Lesen, was gestrichen wurde: Für eine historisch-kritische Kafka-Ausgabe. In Franz Kafka: Historisch-Kritische Ausgabe Sämtlicher Handschriften, Drucke und Typoskripte. Edited by Roland Reuß and Peter Staengle. Frankfurt: Stroemfeld, vol. 1, pp. 9–24. [Google Scholar]
Robertson, Ritchie. 2018. Style. In Franz Kafka in Context. Edited by Carolin Duttlinger. Cambridge: Cambridge University Press, pp. 62–69. [Google Scholar]
Salgaro, Massimo. 2023. Statistics, Stylometry, and Sentiment Analysis in German Studies. Göttingen: Vandenhoeck & Ruprecht. [Google Scholar]
Sartre, Jean-Paul. 1999. Anti-Semite and Jew. October 87: 24–26. [Google Scholar]
Savoy, Jacques. 2020. Machine Learning Methods for Stylometry: Authorship Attribution and Author Profiling. Cham: Springer. [Google Scholar]
Strathausen, Carsten. 2024a. Introduction. In Kafka’s Drawings. Special issue. The Germanic Review 99: 135–43. [Google Scholar] [CrossRef]
Strathausen, Carsten. 2024b. Adapting Kafka. Word & Image 40: 15–24. [Google Scholar]
Student. 1908. The Probable Error of a Mean. Biometrika 6: 1–25. [Google Scholar] [CrossRef]
Thieberger, Richard. 1979a. Sprache. In Kafka Handbuch. Edited by Hartumut Binder. Berlin: Kröner, vol. 2, pp. 177–203. [Google Scholar]
Thieberger, Richard. 1979b. Das Schaffen in den ersten Jahren der Krankheit. In Kafka Handbuch. Edited by Hartmut Binder. Berlin: Kröner, vol. 2, pp. 350–78. [Google Scholar]
Thompson, Mark Christian. 2016. Kafka’s Blues: Figurations of Racial Blackness in the Construction of an Aesthetics. New York and London: Routledge. [Google Scholar]
Troscianko, Emily T. 2018. Reading Kafka. In Franz Kafka in Context. Edited by Carolin Duttlinger. Cambridge: Cambridge University Press, pp. 282–91. [Google Scholar]
Wagner, Benno. 2011. Lightning No Longer Flashes: Kafka’s Chinese Voice and the Thunder of the Great War. In Franz Kafka: Narration, Rhetoric, and Reading. Edited by Jakob Lothe, Beatrice Sandberg and Ronald Speirs. Columbus: Ohio State University Press, pp. 58–80. [Google Scholar]
Walser, Martin. 1968. Beschreibung Einer Form. München: Hanser. [Google Scholar]
Woods, Michelle. 2013. Kafka Translated: How Translators Have Shaped Our Reading of Kafka. New York: Bloomsbury. [Google Scholar]

Figure 1. Classification accuracy for Brod’s edition of Kafka.

Figure 2. Detailed variation in linguistic traits across the three corpora.

Table 1. Comparison of linguistic traits across the three corpora12.

	Kafka Core	Brod’s Corpus	Brod’s Edition of Kafka
Words per sentence	18.4	14.67	18.17
Punctuation per word	0.183	0.215	0.191
Subjunctive I per word	0.00328	0.00342	0.00424

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Strathausen, C.; Shang, W.; Kazakov, A. Kafka’s Literary Style: A Mixed-Method Approach. Humanities 2025, 14, 61. https://doi.org/10.3390/h14030061

AMA Style

Strathausen C, Shang W, Kazakov A. Kafka’s Literary Style: A Mixed-Method Approach. Humanities. 2025; 14(3):61. https://doi.org/10.3390/h14030061

Chicago/Turabian Style

Strathausen, Carsten, Wenyi Shang, and Andrei Kazakov. 2025. "Kafka’s Literary Style: A Mixed-Method Approach" Humanities 14, no. 3: 61. https://doi.org/10.3390/h14030061

APA Style

Strathausen, C., Shang, W., & Kazakov, A. (2025). Kafka’s Literary Style: A Mixed-Method Approach. Humanities, 14(3), 61. https://doi.org/10.3390/h14030061

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Kafka’s Literary Style: A Mixed-Method Approach

Abstract

1. Introduction

2. Kafka, Hermeneutics, and Quantification

3. Stylistic Comparison of Kafka vs. Brod

4. Linguistic Comparison of Kafka vs. Brod

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Technical Details on the Classification Experiment

Appendix B. Technical Details on the Identification of Subjunctive I

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI