A Register-Based Study of Interior Monologue in James Joyce’s Ulysses

: While ﬁctional orality (spoken language in ﬁctional texts) has received some attention in the context of quantitative register studies at the interface of linguistics and literature, only a few attempts have been made so far to apply the quantitative methods of register studies to interior monologues (and other forms of inner speech or thought representation). This article presents a case study of the three main characters of James Joyce’s Ulysses whose thoughts are presented extensively in the novel, i.e., Leopold and Molly Bloom and Stephen Dedalus. Making use of quantitative, corpus-based methods, the thoughts of these characters are compared to ﬁctional direct speech and (literary and non-literary) reference texts. We show that the interior monologues of Ulysses span a range of non-narrative registers with varying degrees of informational density and involvement. The thoughts of one character, Leopold Bloom, differ substantially from that character’s speech. The relative heterogeneity across characters is taken as an indication that interior monologue is used as a means of perspective taking and implicit characterization.


Introduction
Studies on the linguistic aspects of spoken language in literary works have a long pedigree, and they continue to be of interest to scholars working at the interface of literary studies and linguistics (see, for instance, DeVito 1963, p. 354;Burton 1980;Akinnaso 1982, pp. 121-25;Erzgraeber and Goetsch 1987;Goetsch 1985Goetsch , 1987Fludernik 1986Fludernik , 2005Mace 1987;Bishop 1991;Erzgraeber 1998;Thomas 2012;Bublitz 2017;Egbert and Mahlberg 2020;Jucker 2021-among others). Goetsch (1985) used the term 'fingierte Mündlichkeit'fictional orality-for this mode. The relevant studies typically focus on two comparisons, the differences between authentic speech and its fictionalization, and the contrast between narrative and spoken passages of literary works.
The discussion of fictional orality has often revolved around the mimesis debate, i.e., the extent to which fictional speech succeeds in emulating natural conversation (or even whether it intends to) (e.g., Bishop 1991, pp. 59-61;Leech and Short 1991, pp. 159-66;Fludernik 1993, p. 281;Thomas 2012, pp. 15-16). There is general consensus among critics that, due to aesthetic reasons but also because of the different productive circumstances underlying the two modes of communication, literary dialogue represents, at best, an approximation of authentic orality but can by no means be considered the equivalent of a transcript of a tape-recorded speech sequence. While fictional orality may, to some extent, strive to emulate real-world conversational behavior, it is obvious that the imperfections of authentic speech and the plethora of phatic elements (interjections, fillers, pauses) found in everyday conversation are not normally represented in fictional orality (there are, of course, exceptions, such as William Gaddis's novel J R). Obviously, there is significant variation among authors, genres and epochs in the degree of authenticity in fictional orality. In fact, there may be variation among the characters of a novel, as will be demonstrated in this study.
Rather than just emulating natural language conversation, spoken language in literary texts has several functions of its own. Fictional orality supplements the narrative voice, offering additional information, accounts of subjective experience, or personal memories and opinions. It is, thus, an important element of indirect characterization, and in consequence, the respective figures may employ particular registers, focus on 'their' topics, obey or subvert expectations for gendered speech, and/or show recognizable linguistic patterns or repetitive phrases that indicate eccentricities (e.g., Uncle Toby in Laurence Sterne's Tristram Shandy) or ideological obsessions (e.g., Bounderby in Charles Dickens' Hard Times).
Differences between fictional orality and authentic speech primarily concern aspects of linguistic organization that are fairly obvious. On the one hand, they are the necessary consequences of the written format, in which, for example, overlap is hard to represent and contextual information is missing or provided through lexical material. Proper names are therefore used more often to allow for a smooth understanding of turn-taking within groups, and to indicate the addressee without intervention of the narrative voice ('Sorry, Veronica; excuse me.', from Roddy Doyle's novel The Snapper). Spoken language in fictional texts typically features more complex and normatively 'correct' sentences than natural conversation, and especially in older works, the utterances can be of considerable length and occasionally border on monologues.
A new dimension of analysis comes into play when we look at modernist literature. Modernist texts often employ interior monologue or a stream of consciousness (see, for instance, Bickerton 1967;Pascal 1977;Cohn 1978;McHale 1978;Fludernik 1993), so the binary distinction between narration and speech is not applicable, and the perspective has to be extended to include the representation of thought processes. Broadening the perspective in this way also raises some new questions. As in the case of fictional orality, we may ask to what extent the interior speech of the characters varies from one figure to another; we can, moreover, compare the private thoughts of a given character to that character's public speech. Again, such differences can be analyzed in terms of their literary functions, e.g., as a means of perspective-taking and indirect characterization.
The present study investigates the interior monologue in James Joyce's novel Ulysses in comparison to the characters' speech and the narrative voice. Joyce made ample use of this technique and is widely regarded as having perfectionized it. While Joyce did not invent interior monologue-he always pointed at Édourd Dujardin and his novel Les lauriers sont coupés as his source of inspiration-this technique plays a particularly prominent role in Ulysses. Of course, earlier authors have informed their readers about the thoughts and mental lives of their protagonists, but this usually happened in narrative form, or in the form of free indirect thought. In consequence, the linguistic representation of thoughts did not markedly differ from the rest of the narration. Such sentences are grammatically complete and coherent, and while associations do in part lead to rapid shifts in focus, the trains of thought can still be followed with relative ease, e.g., in Jane Austen's Emma and, later, Virginia Woolf's To the Lighthouse. Joyce, instead, aims at a simulation of the thought processes, and in consequence, the interior monologue differs from narration and from the simulated orality. William James wrote in The Principles of Psychology: As we take, in fact, a general view of the wonderful stream of our consciousness, what strikes us first is this different pace of its parts. Like a bird s life, it seems to be made of an alternation of flights and perchings. The rhythm of language expresses this, where every thought is expressed in a sentence, and every sentence closed by a period. (James 1890, p. 243) This alternation of flight and perching is recognizable in Joyce's stream of consciousness, and it can at times be difficult to follow the mental jumps and the rapid succession of seemingly incoherent associations. But while James still considered thoughts to correspond to linguistic units (like sentences), in Ulysses they are often fragmented and grammatically incomplete; there is an abundance of ellipses, and as will be shown, particular linguistic elements are over-or underrepresented.
A considerable part of the text (71,499 out of 216,680 tokens) consists of the interior speech of the three main characters, Stephen Dedalus (11,826 tokens), Leopold Bloom (34,617 tokens), and Molly Bloom (25,055 tokens). 1 We can thus compare their interior speech, and for (Leopold) Bloom and Stephen (Dedalus), we can also compare their interior monologues to their direct speech. This is not possible for Molly (Bloom), as she only speaks a few words directly in Chapter 4 ('Calypso'). (1), (2), and (3) below show the first segments of interior monologue by Stephen, Bloom, and Molly, respectively (the chapter and the line number in the Gabler Edition (Joyce 1986)  (3) Yes because he never did a thing like that before as ask to get his breakfast in bed with a couple of eggs since the City Arms hotel when he used to be pretending to be laid up with a sick voice doing his highness to make himself interesting for that old faggot Mrs Riordan that he thought he had a great leg of and she never left us a farthing all for masses for herself and her soul greatest miser ever was actually afraid to lay out 4 d for her methylated spirit telling me all her ailments she had too much old chat in her about politics and earthquakes and the end of the world . . . (18.1-8) Joyce's experiments were highly influential, and subsequent authors, e.g., John Dos Passos and William Faulkner, adapted the interior monologue to their specific works and agendas. The linguistic analysis of this mode as distinct from narration and simulated orality is thus an important contribution to the field of literary linguistics, providing a better understanding of the individual literary work and the conceptualization of literary language and techniques in general.
Our analysis focuses on two main questions: 1. Does Joyce distinguish between the characters' interior speech or thought representations, and what (if any) are the linguistic features underlying these distinctions? 2.
Are there recognizable linguistic differences between the (direct) speech and thought (interior speech) of the figures?
Of course, we are not the first to ask such questions. Stephen, Bloom, and Molly are quite different in their interests, and their minds not only wander through their respective memories but also reflect on very different topics. Stephen and Bloom have been characterized as having a poetic/philosophical mind (Stephen), as opposed to a scientific/musical mind (Bloom). This general distinction can easily be confirmed by a look at the contents of their thoughts (nouns, proper names, etc.), which has been done comprehensively and convincingly (see for instance Steinberg 1973;Houston 1989;Wales 1992).
In this study, we are not primarily interested in what the characters think, but in how they think. In other words, we are dealing with the distribution of linguistic 'features' in the sense of multidimensional register studies, as advocated by Biber (1988) and in subsequent work. In this tradition, the register is regarded as a multidimensional concept comprising 'dimensions', which are mathematically defined as latent variables underlying the distribution of linguistic features in (parts of) texts, and which are assumed to be symptomatic of specific (situational, functional) varieties. We use the term 'genre' for types of text that are defined in terms of the 'external' circumstances of production. Genres are obviously hierarchically structured. Higher-level genre categories comprise 'fiction' as opposed to 'non-fiction', and at a more specific level, we can distinguish, for instance, news reports from academic texts within non-fiction. Texts belonging to a given genre may exhibit internal register heterogeneity. For example, a prose text may contain direct speech. We use the term 'mode' for such lower-level varieties defined in terms of the circumstances of production. The traditional 'rhetorical modes' or 'modes of discourse' concern the type of information conveyed ('narrative', 'report', 'descriptive', 'information', 'argument'; see Smith 2003). We will use the term 'mode' to stand in for 'mode of production' in this study, subsuming the narrative voice, reported speech, and interior monologue.
The (externally defined) genres and modes have linguistic reflexes. In particular, they are associated with (multinomial) distributions of specific types of linguistic expressions (words, categories), summarized under the term 'features' in multidimensional register analysis. Unlike keyword analysis (Culpeper 2009), Biber-style register analysis abstracts away from lexical content, focusing on the structural design features of a variety. Biber (1988) used 67 features, grouped into 16 major classes. 3 The choice of features is comprehensively discussed and motivated, with pointers to relevant earlier literature (Biber 1988, Appendix II). While originally developed for non-fictional language, multidimensional register analysis has also been applied to literary texts, e.g., by Egbert (2012) and Egbert and Mahlberg (2020).
While following the general spirit of Biber-style quantitative register analysis, our study is also informed by the qualitative work in the tradition of Koch and Oesterreicher (1986), which addresses very similar issues from a slightly different point of view. Koch and Oesterreicher (1986) have reanalyzed the distinction between spoken and written language by distinguishing between the code (phonic, graphic) and the conception (language of immediacy and language of distance; see Werner 2021 for a recent study that deals with performed language).
Beyond the specific goal of investigating the interior monologues of three characters in one of the most important works of modernism, we intend to show how linguistics and literary studies can benefit from, and cross-fertilize, each other. It is important to note that quantitative register analyses of literary texts can only inform and complement, not replace, the hermeneutic analyses traditionally carried out in literary studies. There are two main types of insights for literary studies from such enterprises, which we may call 'confirmatory' and 'exploratory'. Empirical studies can be confirmatory in the sense that they may provide objective and replicable evidence for literary analyses. They can be exploratory in the sense that they may bring to light new aspects of a work that had previously gone unnoticed, in particular, features that are usually under the radar of recognition. From a linguistic point of view, the benefit of studying interior monologue lies in expanding our knowledge of the manifold ways in which language varies according to the situational or functional context, which is the general objective of register studies as a sub-discipline of linguistics.
In Section 2, we introduce the data and methods used for our study. Section 3 presents the results, which are discussed in Section 4. Section 5 contains the conclusions.

Materials and Methods
Our research is based on a corpus of literary texts compiled and annotated for the purpose of a project on fictional orality, in which Ulysses (Gabler edition) was included (Joyce 1986). A list of the texts is provided in Appendix A. For the present analysis, the following thirteen chapters of Ulysses are relevant, as they contain either direct speech or interior speech, or both: We did not include Ch. 15 ('Circe') in our analysis, even though it is written in dramatic form and thus contains a lot of speech. The hallucinatory language is simply too artificial for our purposes.
The data was manually annotated for speakers by adding XML-style tag pairs to the raw text, as illustrated in (4). The annotations for thoughts were created independently by D. Vanderbeke and T. Mészáros. Cases of disagreement were resolved by discussion. 4 (4) <buck>The aunt thinks you killed your mother.</buck> <narr>he said.</narr> <buck>That is why she will not let me have anything to do with you.</buck> <step>Someone killed her,</step> <narr>Stephen said gloomily.</narr> In order to estimate degrees of variability in the structural make-up of character/mode combinations, we identified larger segments of Bloom's and Stephen's speech and thoughtssegments that were not interrupted by more than thirty words of another mode-and treated them as sets of sub-samples. Molly's soliloquy was segmented by treating yes as a separator. These 'segment samples', as we call them, allowed us to determine significance levels for comparisons between character/mode combinations and other texts.
Register studies require a reference corpus in order to create a multidimensional space within which the data under analysis can be situated. We used the British National Corpus (BNC) 5 for this purpose, a standard corpus with fine-grained genre classifications for texts, thereby distinguishing 46 categories. 6 The data were also annotated per token, in two ways. First, the entire corpus, including the BNC, was lemmatized and annotated for parts of speech with the state-of-the-art Python package stanza. 7 We used the tagset of the Penn Treebank (PTB) with 36 tags, which are listed in Appendix B. 8 The PTB tags represent a standard in much of theoretical and computational linguistics. The PTB annotations are therefore useful for comparison with other datasets.
The PTB tags contain rather general information, subsuming not only major content words but also function words in major classes. For example, personal pronouns are represented by the tag 'PRP'. For register studies, it is, however, useful to differentiate between different types of pronouns (I, you, she, etc.). The same applies to prepositions and other classes of function words. We therefore created a representation of the texts that abstracts away from content words while keeping individual function words. We call this type of representation a 'structural skeleton'. By generalizing over content words, we wanted to make sure that we measure register features of the genres and modes under analysis, not topicality as reflected in lexical material (lexical material plays a central role in keyword analysis; see, for instance, Fischer-Starcke 2009; see Culpeper 2009 for a discussion of the potential of keywords, part-of-speech tags and semantic categories). Nouns, proper nouns, verbs, adjectives, and adverbs were replaced by their relevant part-of-speech tags in the structural skeletons. For illustration, consider Table 1. The top row shows the raw text, the second row contains the text annotated with the PTB tags, and the third row shows the structural skeleton that remains if the specific content words (lemmata) are replaced with the appropriate tag. The corpus in this format is made available in the Supplementary Materials (see the Data Availability Statement). The second type of structural annotation was carried out with the 'Multidimensional Analysis Tagger' (MAT). 9 The MAT replicates the tagger used by Biber (1988), developed specifically for register research. It is based on the Stanford tagger 10 and applies replacement rules to obtain results that are largely equivalent to those of the tagger used by Biber (1988). The tagger assigns not only part-of-speech tags to tokens but also categorizes specific elements semantically. For example, it assigns tokens to semantic classes such as 'private verb', 'public verb', and 'suasive verb', and it distinguishes different types of modals (possibility modals, necessity modals, predictive modals).
The MAT moreover determines dimension scores for texts, for the following dimensions of register variation: 1.
Online informational elaboration.
Nini (2019) points out that the MAT is not fully equivalent to the original Biber tagger. Validating the MAT on the basis of the LOB-corpus (also used by Biber 1988), he found that despite some differences (specifically in the scores determined for Dimension 3), it is reliable and also generalizes beyond the data used by Biber (1988). Still, it must be kept in mind that there is a certain deviation from the scores assigned by the Biber tagger, and that any subtleties in dimension scores, therefore, must be treated with care.
For our quantitative analyses, we relied on methods that are commonly used in literary register studies. In order to locate a text in multidimensional register space, we used the dimension scores (obtained with factor analysis) of the MAT. For comparisons of genres and modes, we applied the Mann-Whitney U test, as the data were not normally distributed. In order to identify linguistic features that are characteristic of a specific genre, mode, or character/mode combination, we determined adjusted Pearson residuals. Adjusted Pearson residuals are defined as the differences between the observed and the expected frequency, divided by the standard error of the residual (O: observed frequency; E: expected frequency; n: sample size): As is standard, we regard a structural marker as over-or underrepresented (at α = 0.05) if the standardized residual is higher than 2 or lower than −2.

Results
We start in Section 3.1 by locating the modes of interest (interior monologue, fictional orality, and the narrative voice) in multidimensional register space. These results are based on the tags obtained with the multidimensional tagger, and the data from Ulysses are compared to other literary texts of our sample and the texts from the BNC. In Section 3.2, we zoom in on the (fictional) speech and interior monologues of Leopold Bloom and Stephen Dedalus, and the interior monologue of Molly Bloom. This section is based on the PTB tags assigned by the stanza package.

Thoughts and Speech in Multidimensional Register Space
The dimension scores for the five character/mode combinations according to the MAT are shown in Figure 1, in the way Biber-style dimension scores are 'traditionally' displayed (see, for instance, Biber 1989, Crosthwaite andCheung 2019). The plot in the top-left shows all character/mode combinations. The other plots show one character/mode combination together with the genre from the BNC which is most similar in terms of the dimension scores (measuring similarity as Euclidean distance in multidimensional space): 11 • Bloom/thoughts: news script • Bloom/speech: lectures on humanities and arts subjects • Stephen/thoughts: poetry • Stephen/speech: (auto)biographies • Molly/thoughts: live sport commentaries and discussions While some of these associations between characters and genres may appear peculiar choices, they are certainly not unreasonable if we keep in mind that we are comparing grammatical structures, not vocabulary or topics. Bloom's thoughts and speech feature elements of written-to-be-spoken genres (news scripts and lectures). Stephen's thoughts resemble poetry, and his speech is similar to the language of biographies. Molly's thoughts resemble unplanned monological spoken language, as far as structural organization is concerned. Figure 1 shows that most of the variance between the character/mode combinations is found along Dimension 1 of the multidimensional register space. The lines for the other dimensions are relatively flat. If we compare speech and thought representations to the narrative voice, Dimension 2 is the most distinctive one. The two plots in Figure 2 show the Dimension 1 and 2 scores for the five character/mode combinations, along with fictional speech and the narrative voice from all literary works of our corpus (left: all values, right: mean values; for Ulysses, the chapters were treated separately, given their considerable stylistic variability). The plots give us an idea of the distances between different modes or character/mode combinations. They show that on Dimensions 1 and 2, thought aligns with speech (in literary texts) insofar as the relevant data points cluster around zero.
The upper polygon in the left plot in Figure 2 encloses the written material from the texts of our literary corpus. Unsurprisingly perhaps, this material exhibits relatively high values on Dimension 2, which measures 'narrative concerns'. The polygon at the bottom encloses the spoken material from the corpus (dialogues). As is to be expected, the spoken material from the novels exhibits higher values for Dimension 1 (measuring 'involvement') than the narrative voices, and lower values for Dimension 2. (Note that the left-top spike of the LIT/spoken polygon represents Chapter 4 of Ulysses, where Bloom and Molly are introduced; most other novels or chapters have much lower values for Dimension 2, for fictional orality.)  With respect to the interior monologues of the novel in comparison to the characters' speech, the following observations stand out when inspecting the plots in Figure 1: 1.
Speech and thought representation in Ulysses are generally non-narrative, as is reflected in their low Dimension-2 values (see also Figure 1).

2.
Both speech and thought representations occupy a large range of values for Dimension 1, showing varying degrees of involvement and informativity.
More specifically, we can make the following observations about the characters under analysis:

3.
Molly's thoughts and Bloom's speech are located well within the region of values covered by spoken language in other fictional texts, whereas both Stephen's speech and thoughts, and Bloom's thoughts, are located outside of that region. 4.
Stephen's thoughts and speech are very close to each other, whereas Bloom's thoughts and speech are quite far apart, mainly distinguished by Dimension 1.
Before discussing these results in Section 4, we now turn to a more fine-grained analysis of the structural patterns observed in the character/mode combinations of interest.

Comparative Analysis of Speech and Thoughts in Ulysses
In this section, we zoom in on the three characters, comparing their thoughts to each other, and to direct speech where available (Bloom, Stephen). We start with Stephen in Section 3.2.1 because his thoughts constitute an extreme case of informational language, which we can use as a point of reference for Bloom (Section 3.2.2) before we compare both of these characters to Molly (Section 3.2.3).

Stephen's Thoughts and Speech
The markers that are significantly over-represented (with an adjusted residual of >2) are shown in Table 2. There is very little verbal material even in the right columns (showing speech). Both modes exhibit a heavily nominal style, though Stephen's speech is slightly more verbal than his thoughts. The main difference between speech and thoughts consists of the type of referential expressions used. In his thoughts, Stephen uses the (deictic) first-person pronouns I and my, and the second-person pronoun you, more than in his speech, where the (anaphoric) third-person singular pronouns he and it are more prominent. Moreover, singular and plural nouns (and pronouns) figure more prominently in his thoughts, as do gerunds and adjectives. The absence of verbal elements from the list of over-represented markers, and the absence of elements creating cohesion, is striking. 'Verbless' subjects render Stephen's reflections often static and tableauesque, which is in keeping with his aesthetic theory developed in 'Proteus'. While Stephen's thoughts seem to be organized around the noun phrase as a main structural unit, his speech exhibits traces of nominal and stative verbal predication, as reflected in the copula be and the auxiliaries may and have. Moreover, there are symptoms of the rhetorical mode of description, e.g., the relative pronouns who(m) and which.
Stephen's speech also features a relatively large number of prepositions and conjunctions, in comparison to his thoughts. Figure 3 shows a comparative bigram graph for Stephen's thoughts and speech. The arrows indicate over-representation in either thoughts (red) or speech (blue), in comparison to the other mode. The cluster at the centre illustrates the nominal patterns that are frequent in his thoughts: singular and plural nouns co-occurring with other nouns, in combination with the preposition of and past participles (VBN). The (blue) patterns showing the conditional distribution of elements in speech point to more verbal structures, e.g., VBD → f or and I → VBP. The relatively incoherent nature of Stephen's thoughts is probably not unrelated to the over-representation of plural nouns. These structural markers are symptoms of a generalizing attitude, making reference to categories rather than instances of those categories, as can be observed in (6) above and in the following examples: (9) Young shouts of moneyed voices in Clive Kempthorpe's rooms. (1.165) (10) Like him was I, these sloping shoulders, this gracelessness. My childhood bends beside me. Too far for me to lay a hand of comfort there once or lightly. Mine is far and his secret as our eyes. Secrets, silent, stony sit in the dark palaces of both our hearts: secrets weary of their tyranny: tyrants, willing to be dethroned. (2.168-172) The highly impersonal type of reference that characterizes Stephen's thoughts also drives the over-representation of plural pronouns. They are often used for collective reference, e.g., to cows (in (11)), pupils (in (12)), or implicitly, the police (in (13)).
(11) Crouching by a patient cow at daybreak in the lush field, a witch on her toadstool, her wrinkled fingers quick at the squirting dugs. They lowed about her whom they knew, dewsilky cattle. (1.400-403) (12) In a moment they will laugh more loudly, aware of my lack of rule and of the fees their papas pay. (2.28-29) (13) Yes, used to carry punched tickets to prove an alibi if they arrested you for murder somewhere. (3.179-180) As was seen above, Stephen's thoughts often center around himself, as is reflected in a relatively frequent occurrence of the pronoun I (cf. the interior dialogue in (14)). The pronoun you is also over-represented, often used to refer to Stephen himself or used impersonally; cf. (15). Note also that some of Stephen's self-references are associated with past-tense verbs (in memories, cf. (16) (3.194-196) The most typical features of Stephen's thoughts-the generalizing reference and the lack of cohesion-are symptoms of a type of reflection that is often a direct response to immediate sensory input. This is compatible with a low frequency of occurrence of the distal demonstrative determiner that, which is significantly underrepresented in Stephen's thoughts and speech in comparison to Bloom's thoughts and speech, as will be seen in Section 3.2.2. Examples where Stephen uses the proximal demonstrative this (rather than that), in contexts of perception, are given in (17) To sum up, Stephen's thoughts exhibit a heavily nominal style, with large numbers of plural nouns and pronouns. This style is symptomatic of Stephen's often impersonal, generalizing reference and his abstract thinking. The striking lack of cohesive elementsreflected in the rarity of conjunctions and anaphoric pronouns, among other features-is compatible with the high degree of information density pointed out above. It is also what renders Stephen's thoughts similar to the language of poetry, where anaphoric pronouns are relatively rare and definite descriptions are relatively frequent. The high information density in Stephen's monologue could also be part of a narrative strategy that aims at a simulation of rapid thought processes.
Stephen's thoughts are very similar to his speech, stylistically speaking. In fact, a comparison of the segment samples shows that Stephen's thought and speech differ in only one of Biber's (1988) six dimensions, Dimension 1, according to a Mann-Whitney U test (p = 0.037). There is no significant difference in any of the other dimensions. As was shown above, the differences between Stephen's thoughts and speech in Dimension 1 are mainly due to types of reference. It is probably natural for people to think more about themselves than they speak about themselves (and Joyce was probably aware of this when creating Stephen's thought representations). From this point of view, even the difference along Dimension 1 may not be primarily a matter of style, but of perspective and topicality-self-reference and egocentric thinking vs. a more outward-directed attitude in speech. This difference is illustrated in (20)-(21) (thoughts) and (23)

Bloom's Thoughts and Speech
The adjusted Pearson residuals for Bloom's speech and thoughts are shown in Table 3. There is a straightforward separation into markers associated with nominal style at the top in the left column (prevalent in thoughts) and markers typical of verbal style in the right column, associated with Bloom's speech. Among the most important structural markers that are over-represented in Bloom's thoughts, in comparison to his speech, we found elements of nominal syntax (nouns, proper names, adjectives, gerunds, etc.), the pronoun she, reflecting Bloom's thoughts about Molly, past tense verbs reflecting memories, and the demonstrative determiner that. The list of structural markers that are underrepresented in Bloom's thoughts, and hence, over-represented in his speech, includes first and second-person pronouns (I, me, you), the interjection yes, the modals will and can, the demonstrative pronoun that, and the demonstrative determiner this. Some of these features are elements of conversation; the modals and demonstratives will be discussed below.
A comparative bigram graph for Bloom's thoughts and speech is shown in Figure 4. Bigrams associated with thoughts (in red) include nominal ones, such as the combination of the definite determiner the and plural nouns (NNS), but there are also verbal patterns, e.g., we → VBP. The blue edges, signaling over-representation in speech, are more pronounced, including elements pointing to interactive speech (e.g., you → VB, you → VBP and what → I). This is in accordance with the impression given by Figure 2. Bloom's interior monologues are distinguished from his speech by low values on Dimension 1, showing low degrees of involvement, and hence, relatively low frequencies of features prominently associated with this dimension, such as first and second-person pronouns. To date, we have considered Bloom's thoughts in comparison to his speech. The picture changes considerably if we change our perspective and compare Bloom's thoughts to Stephen's thoughts. The adjusted residuals for this comparison are shown in Table 4. Symptoms of nominal style, such as proper names (NNP), plural nouns (NNS), and the preposition of, are more prevalent in Stephen's thoughts. In Bloom's thoughts, base forms of verbs (VB) are over-represented, which corresponds to the relatively high incidence of modal auxiliaries (could, must, might, would).
More precise observations can be made by inspecting the comparative bigram graph in Figure 5. Stephen often combines nouns with other nouns, either with a Saxon genitive ('s) or with the preposition of. Sequences of a definite article and a noun are more typical of Bloom's thoughts, pointing to a higher degree of cohesion. Another observation to keep in mind is the important role of adverbials (RB), which are not only relatively frequent in Bloom's thoughts (cf. Table 4) but also figure prominently in bigrams following (singular or plural) nouns (see Figure 5).
As has been pointed out, there are elements of dialogue in Bloom's thoughts, despite their mostly nominal organization. Relevant examples are given in (26) Bloom's inner dialogues and self-corrections testify to a high degree of epistemic uncertainty. 12 In Bloom's thoughts, epistemic uncertainty is once more reflected in the use of modals. The modal must is particularly typical of Bloom's thoughts and is significantly over-represented in comparison to both Bloom's speech and Stephen's thoughts and speech. This is found in both deontic and epistemic contexts. In deontic uses of must, Bloom's inner voice often has a reminding or self-admonishing function, as illustrated in (29)-(31). While epistemic must, though expressing uncertainty in comparison with an indicative sentence, conveys a relatively high degree of certainty, epistemic could is more tentative. It is significantly over-represented in Bloom's thoughts in comparison to all other speech or thought combinations. Could is sometimes used by Bloom to make a suggestion to himself, as in the following examples: The comparatively high degree of interactiveness in Bloom's thoughts-in comparison to Stephen's thoughts-is also reflected in the use of demonstratives. The demonstrative that, used as a prenominal determiner (e.g., that Father Farley), is significantly overrepresented in this mode, in comparison to both Bloom's speech and Stephen's thought and speech. As a determiner, that often appears in a 'recognitional' use (cf. Diessel 1999;Enfield 2003). In recognitional use, a demonstrative refers to an entity that is presupposed as shared knowledge; i.e., the author of the proposition relies on the (implied) addressee's ability to identify the intended referent. This can be observed in (38), where the temporal adverbial that time refers to an earlier experience that is accessible to the character (in 'normal' conversation it would have to be accessible to the addressee). 13 The same familiarity requirement is present for that Norwegian captain in (39) and for that Capel street library book in (40).
(38) Creaky wardrobe. No use disturbing her. She turned over sleepily that time.
(4.73-74) (39) Chap you know just to salute bit of a bore. His back is like that Norwegian captain's. Some of the observations made above can be related to the dimension scores obtained with the MA Tagger (remember the caveat from Section 2 regarding the not entirely perfect, but reasonable, match between the tags assigned by the MAT and those of the Biber tagger): • Bloom's thoughts exhibit little involvement in comparison to Bloom's speech, but more involvement than Stephen's thoughts. This is reflected in the fact that they have significantly higher scores for Dimension 1 (p = 0.022). • They are located at the lower end of narrativity but are more narrative than Stephen's thoughts, with higher scores for Dimension 2 (p < 0.016). • They use a more situation-dependent reference than Stephen's thoughts, as is also reflected in lower scores for Dimension 3 (p < 0.001). • They display more features of persuasion (have higher values on Dimension 4) than Stephen's thoughts, or perhaps more appropriately in the context of interior monologue, deliberation (p = 0.002). • They exhibit more traces of online informational elaboration than Stephen's thoughts (have higher values on Dimension 6, p = 0.002).
As this list shows, there are substantial differences between Stephen's and Bloom's thoughts. In the next section, we will see that Bloom's thoughts tend in the direction of Molly's thoughts, whose interior monologues exhibit even higher Dimension 1 scores.

Molly's Thoughts
As Table 5 shows, Molly's thoughts are characterized by a high incidence of the personal pronouns I and he (and markers such as reflexive possessive forms), reflecting her thoughts about herself, Bloom, and her lover Blazes Boylan (and other men). The stream-of-consciousness-like style is reflected in conjunctions and subjunctions, and 'heavy' prepositions (disyllabic ones, or monosyllabic ones with at least three sounds); e.g., with, like, into, and about. As Molly thinks about the day past, past-tense verbs figure prominently. The relative over-representation of nominal elements in Stephen's and Bloom's thoughts, reflected in the high residuals for the tags NN, NNP, and NNS, corresponds to an underrepresentation in Molly's thoughts. Molly essentially avoids nouns and adjectives, and there are hardly any genitives or prepositional phrases headed by of. Other 'light' prepositions (with two sounds only) such as to and in are also underrepresented. The comparative bigram graph shown in Figure 6 shows an interesting fact: While nouns are significantly under-represented in Molly's thoughts, as pointed out above, there are two bigrams that are heavily over-represented, NN → he and NN → I. There are two main reasons for the frequency of these patterns in Molly's speech. First, due to the absence of punctuation, there are often sequences of an object and the subject of the following sentence or clause, as in (44) (flower he). Second, Molly makes relatively frequent use of 'bare' object relative clauses, as in (45)  In sum, Molly's thoughts are much more similar to fictional orality than either Stephen's or Bloom's thoughts, but they are much closer to Bloom's thoughts in terms of their structural make-up.

Discussion
We started by asking two questions in Section 1, repeated here for convenience:

1.
Does Joyce distinguish between the characters' interior speech or thought representations, and what (if any) are the linguistic features underlying these distinctions? 2.
Are there recognizable linguistic differences between the (direct) speech and thoughts (interior speech) of the figures?
The answer to question 1 is very clear: There are marked differences between the three characters. On one end of the scale, there is Stephen, with a heavily nominal style, high information density, a low level of cohesion, a high level of elaboration, and comparatively few signs of online processing constraints. On the other end of the scale, Molly has a primarily verbal style, with lower information density, a higher degree of cohesion (e.g., through anaphoric pronouns and conjunctions), and obvious online production constraints. Bloom is located in between in this respect, though overall much closer to Stephen than to Molly.
An answer to question 2 is more difficult to provide. Joyce does distinguish quite categorically between Leopold Bloom's speech and thoughts, but the stylistic differences between Stephen's speech and thoughts are very minor. A significant difference in the dimension scores was only observed in Dimension 1, and it was mainly due to the type of reference (self-reference vs. anaphoric reference). The observation that Stephen's thoughts center around himself while in his speech he tends to refer to others is probably not even primarily a stylistic difference. By contrast, the speech and thoughts of Bloom differ significantly in five of Biber's six dimensions. They are less involved, more narrative, and more context-dependent; they feature more elements of deliberation; and they show more traces of online production under time constraints.
The difference in the degrees of similarity between Bloom's and Stephen's speech and thought processes could be seen as a means of implicit characterization. In Stephen's case, speech and thought are rather similar, as the reader in fact learns very early on, when Stephen unabashedly complains about Haines to Buck Mulligan at the beginning of the novel. Bloom's speech, by contrast, shows a marked contrast between the public face and the private thoughts. Such discrepancies between Bloom's thoughts and speech become apparent, for instance, when he greets the pub owner Larry O'Rourke: We can thus notice a tension in Bloom, a desire to please, but the rather polite and positive external appearance hides a sometimes less comforting and agreeable internal perspective. We also observed a certain type of epistemic uncertainty in his thoughts, reflected in the use of modals and frequent self-corrections, which is often due to the kind of 'theory formation' characteristic of Bloom-he states a concept or preliminary hypothesis which is then rejected and replaced by a more accurate one. This is also reflected in the relatively high frequency of no as an interjection, as shown in Table 5.
Our quantitative findings based on thoughts and speech have also confirmed the familiar characteristics often attributed to Bloom and Stephen-Bloom's careful nature and polite demeanour vs. Stephen's egocentrism and arrogance, and Bloom's tendency to generalize and to speculate about various matters vs. Stephen's focus on himself.
On the basis of the results reported above, we can now ask a more general question: Is 'interior monologue' a mode that can be identified as such by using the tools of register analysis? 14 As should have become clear, an answer to this question is not easy to provide on the basis of the material from Ulysses, as there are marked differences between the characters. Interior monologue typically exhibits few elements of conversational interaction (though there is some self-talk), and thus tends to be distinguished from speech (though less so in Stephen's case, who refrains from the use of linguistic politeness markers and phatic communication in his speech, rendering his speech quite thought-like). For the same reason, thought representations tend to be informationally denser than speech: the social dimension of speech is missing. However, as was seen in Molly's case, interior monologue may still primarily resemble spoken language.
On the basis of the (limited) evidence available to us in James Joyce's Ulysses, our answer to question 2 is that the form of the thought and interior speech representations is not primarily determined by the design features of any given mode; rather, it is a matter of choice, and it is used for at least two important poetological functions, perspectivization and implicit characterization. Seeing the fictional world through the eyes of a character, without the burden of social communication (as in direct speech), opens up a multitude of perspectives. Peeping into a character's mind moreover tells us something about that character him/herself, e.g., when we think of Bloom's inner deliberation and discrepancies between thoughts and speech. Molly is primarily accessed through her own thoughts.

Conclusions
The present study is part of an interdisciplinary effort to use the tools of quantitative linguistic analysis (as widely applied in corpus linguistics and register studies) for a better understanding of literary texts. We compared interior monologue with fictional orality in J. Joyce's novel Ulysses because this mode has not received much attention in quantitative studies, and because in Ulysses, one of the most important literary works of modernism, Joyce makes extensive use of that technique and exhibits a level of sophistication that is remarkable even today. Needless to say, the study of interior monologues from a register point of view should be extended in the future by comparing our results to those obtained on the basis of other texts to further understand this understudied mode.
At the most specific level, our study has shown that interior monologue in Ulysses, while being clearly distinct from the narrative voice, is not linguistically homogeneous. In particular, the thoughts of the characters under analysis (Leopold Bloom, Molly Bloom, and Stephen Dedalus) exhibit varying degrees of involvement, as measured by Dimension 1 of a multidimensional register analysis. One character, Stephen Dedalus, showed a very low level of involvement and a high degree of information density, whereas another character, Molly Bloom, is located at the opposite end of the spectrum, and her thoughts resemble spoken language. The third character, Leopold Bloom, is located in between.
Representing the thoughts or interior monologues of speakers obviously implies quite some creativity on the part of the author. There is no consensus as to whether or not, or to what extent, thoughts can be thought without being verbalized. In any case, the author does not have access to other people's thoughts. While the speech of characters can be modeled on the example of observed behavior-and it is well-known that Joyce did use real-world models for his characters-no such emulation is possible for the representation of thoughts. Obviously, authors can reproduce their own inner voices. As Stephen is widely regarded as Joyce's alter ego (also in A Portrait of the Artist as a Young Man), it is likely that his interior monologues to some extent represent the author's thoughts. This may, to a lesser extent, also apply to Bloom, who exhibits features of the older James Joyce (Bloom is 38 years old in the novel; Joyce was 36 when the novel started to be published in The Little Review in 1918). For Molly, it is conceivable that Joyce, to some extent, was inspired by letters from his wife Nora, who-like the representation of Molly's thoughts-did not use punctuation marks.
In Section 1, we made a distinction between 'confirmatory' and 'exploratory' results of quantitative register studies applied to literary texts. Most of our results have been confirmatory, providing empirical evidence for some fairly well-known aspects of the novel, such as properties of the characters. However, the linguistic analysis also highlighted specific elements of the characters and their representations that were not immediately obvious, specifically in the case of Leopold Bloom. For instance, while the use of the particle yes by Molly Bloom is very prominent in her soliloquy, it is not widely known that her husband Leopold Bloom displays a significant preference for no in his interior monologues, compared to the thoughts of Molly and Stephen (cf. Table 5). In fact, following must and she, no is the third most prominent word in Bloom's thoughts. Whatever the exact implications of that observation may be, the observation itself is certainly not trivial, specifically as Joyce is known to have been extremely careful and precise in the linguistic construction of his figures, and to have paid attention to minuscule details.
Finally, we wish to point out that the methods that we have used have been rather conventional within a quantitative register framework. We used frequency distributions of structural markers as input to quantitative analyses, as is customary in register studies. The texts were treated as bags of words, or rather, bags of structural markers. This means that we focused on global statistical distributions, rather than a linear sequence of elements. Given that methods taking linear order into account are becoming more and more important in computational linguistics, and as more and more complex language models have been developed, it seems reasonable to explore ways of applying such models to questions of literary interest like the one addressed in this study.  Data Availability Statement: The original texts constituting the literary corpus cannot be published for reasons of copyright. The structural skeletons on which the analysis is based are contained in the TSV-file 'data/LitCorpSkeleton.tsv'. The MAT dimension scores underlying the analyses are provided in the files 'data/Dimensions_BNC_for_MAT.csv', 'data/Dimensions_MAT_by_speaker.csv', and 'data/Dimensions_MAT_chunks_new.csv'. The adjusted Pearson residuals were determined with the script 'py/get_residuals.py'. The results are contained in the folder 'results'. The plots in Figure 1 were generated with the script R/plots_and_stats.R, using the R-packages tidyverse (Wickham et al. 2019), stringr (Wickham 2022), ggplot2 (Wickham 2016), ggrepel (Slowikowski 2021), and ggforce (Pedersen 2022). The bigram models shown in Figures 3-6 were created with the Python scripts in the folder 'py', using the graph t ool package for Python (https://graph-tool.skewed.de/, accessed on 2 January 2023). 9 See Nini (2019) and https://sites.google.com/site/multi-dimensionaltagger, accessed on 2 January 2023. In order to maintain a certain degree of comparability with earlier work on register studies, we used Biber's original model (as annotated by the MAT), rather than, for instance, the model of Egbert (2012), or the 'enhanced' model of Xiao (2009). 10 https://nlp.stanford.edu/software/tagger.shtml, accessed on 2 January 2023. 11 The MAT also assigns to each text the closest text type, based on the corpus material used for Biber's (1988) original study (the LOB corpus). The genres of the LOB-corpus are more coarse-grained than those of the BNC, however. For Bloom's thoughts the tagger identifies 'general narrative exposition' as the closest text type, 'involved persuasion' for Bloom's speech. 'General narrative exposition' is also the text type that is closest to Stephen's thoughts and speech. Molly's thoughts are grouped with 'involved persuasion'. 12 It may be noted that the dialogical form then governs Chapter 17 ('Ithaca'). Focusing on Bloom in his own home, it is presented as a catechism, consisting completely of questions and answers. In contrast to the pseudo-scientific language of this chapter, the result is also epistemic uncertainty, and the most important nuggets of useful knowledge are frequently buried in a form of literary logorrhea garbed as precise factual information. 13