The Neglected Group: Cognitive Discourse Markers as Signposts of Prosodic Unit Boundaries

Majhenič, Simona; Beras, Mitja; Križaj, Janez

doi:10.3390/languages10070159

Open AccessArticle

The Neglected Group: Cognitive Discourse Markers as Signposts of Prosodic Unit Boundaries

by

Simona Majhenič

^1,*

,

Mitja Beras

² and

Janez Križaj

³

¹

Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000 Maribor, Slovenia

²

Faculty of Mechanical Engineering, University of Maribor, 2000 Maribor, Slovenia

³

Faculty of Electrical Engineering, University of Ljubljana, 1000 Ljubljana, Slovenia

^*

Author to whom correspondence should be addressed.

Languages 2025, 10(7), 159; https://doi.org/10.3390/languages10070159

Submission received: 14 April 2025 / Revised: 17 June 2025 / Accepted: 25 June 2025 / Published: 27 June 2025

(This article belongs to the Special Issue Current Trends in Discourse Marker Research)

Download

Browse Figures

Versions Notes

Abstract

The present paper examines and compares the role of cognitive discourse markers (DMs), such as uhm, like, or I mean, and a set of prosodic parameters as indicators of prosodic boundaries. Cognitive DMs traditionally are not studied as a separate DM group on par with the ideational, sequential, rhetorical, or interpersonal group. However, as they reflect the speaker’s mental processes during speech production, they offer an exceptional glimpse into how speakers construct their verbalisations. Along with the analysis of DMs, prosodic parameters, including pitch and intensity reset, speech rate change, and pauses, were automatically annotated to determine how well they overlapped with the manually annotated prosodic boundaries. To accommodate for the natural variability in speech, the parameters were evaluated using relative comparison methods. Among the prosodic parameters, pauses were found to overlap most often with the manually annotated prosodic boundaries. Cognitive DMs in the function of realising new information, restructuring, and emphasis indeed proved as relevant boundary indicators, however, the group of cognitive DMs as a whole fell behind the group of sequential and rhetorical DMs, which overlapped most frequently with the manually annotated prosodic boundaries.

Keywords:

cognitive discourse markers; prosodic units; prosodic unit boundary

1. Introduction

In spontaneous speech, fluidity is an illusion (Chafe, 1994, p. 57). Every speaker, whether consciously or unconsciously, navigates moments of hesitation, reformulation, or transition—points at which language production slows down and the underlying cognitive mechanisms become most transparent. This is where cognitive discourse markers (C-DMs) emerge, filling gaps in speech with elements like I mean, uhm, you know, uh-hum, so, etc.

Traditionally, some of these elements (such as uh, uhm, or like) were seen as undesirable speech phenomena and were labelled as notorious ‘fillers’ or placeholders. Accordingly, they are often exempt from DM research, as their roles in discourse organisation are, perhaps, not as apparent as those of the typical well, now, however, and, etc. More recent research (Maschler, 2009; Womack et al., 2012; Aijmer, 2013; Tottie, 2019; Tonetti Tübben & Landert, 2022), however, shows that they serve as cognitive scaffolding, that helps speakers manage discourse transitions and signal how speech unfolds and how it should be interpreted. Consequently, some authors (Bazzanella, 2006; L. Yang, 2006; Maschler, 2009) explore them as a separate group of DMs, due to their unique subtle functions, such as speech planning or information processing.

Past research has already confirmed the link between (general) DMs and prosodic boundaries (Maschler, 2009; Selting et al., 2009; Degand & Simon, 2009; Cabedo, 2014; Morel & Vladimirska, 2014; Majhenič et al., 2022). This comes as no surprise, as DMs help structure and organise our speech and prosody, similarly, signals speech segments. This research, in contrast, aims to take a closer look at those DMs that reflect the speaker’s cognitive state. Since C-DMs reflect speech planning, information processing, hesitation, and similar phenomena, characteristic of spontaneous spoken language, which may slow down or disrupt the verbalisation process, they should be linked to prosodic boundaries. When a speaker encounters a point of increased cognitive demand—whether processing, hesitating, or reformulating a thought—both prosodic and verbal cues facilitate processing. Prosodically, this may manifest as pauses, variations in pitch or intensity, or lengthening. Discursively, one of the linguistic forms that this cognitive effort can manifest is C-DMs.

This study explores the extent to which C-DMs—a group of elements that usually take the backseat in DM research—and prosodic unit boundaries overlap. If a DM truly acts as a cognitive marker and if the verbalisations are spontaneous, then the alignment of C-DMs with prosodic boundaries should not be incidental, as C-DMs reflect the cognitive effort of the speaker’s mental processing. This processing can manifest prosodically as deceleration or the occurrence of silence (among other patterns) which are common characteristics of prosodic unit breaks. By examining the prosodic realisation of C-DMs across different discourse settings, this research explores the intersection between cognitive processing and prosodic structuring of spoken language. The present paper, therefore, seeks to answer the following research question:

RQ: Are cognitive DMs the group of DMs that overlap most frequently with manually annotated prosodic unit boundaries?

Moreover, as the literature on prosodic units lists several phenomena that are linked to prosodic unit boundaries (almost notoriously, intonational contours or pitch reset), the present paper will also look into which of the most commonly noted prosodic parameters overlap with the manually annotated prosodic boundaries. To shed light on how the results of the prosodic parameters as signposts of prosodic unit boundaries compare to DMs as boundary signposts, we will strive to answer the second research question:

RQ: Which prosodic unit boundary parameter overlaps most frequently with manually annotated prosodic boundaries?

By doing so, we hope to gain insight into how we segment speech. Prosodic units are understood to reflect the segmentation of speech into cognitively manageable chunks (Du Bois et al., 1992/1993; Chafe, 1994). Recent work supports this view: for instance, Ots and Taremaa (2023) demonstrate that both native and non-native listeners rely on prosodic cues to segment unfamiliar speech into perceptual chunks, underscoring the role of prosody in cognitive chunking. Similarly, developmental research by Wellmann et al. (2023) shows that even infants are sensitive to these cues, suggesting a foundational cognitive mechanism for speech segmentation. C-DMs, such as reformulations, hesitations, or restructuring markers, signal moments of increased cognitive load, self-monitoring, or interactive management. Their co-occurrence at prosodic boundaries may indicate points at which speakers segment speech according to both cognitive processing demands and interactional needs. Ultimately, the alignment between C-DMs and prosodic boundaries may reveal insights into how humans organise spoken language, offering a glimpse into the cognitive underpinnings of real-time discourse production.

The paper is structured as follows: in the second section, we present the theoretical background, focusing on prosodic units and their boundaries and the intersection of DMs and prosodic units. The third section outlines our methodological approach, detailing the materials and the annotation process. In the fourth section, we present our results. These are then discussed and evaluated in the fifth section. Our concluding thoughts are given in the sixth section. An appendix (Appendix A) with the applied DM annotation scheme is provided at the end of the paper.

2. Theoretical Background

2.1. Prosodic Units

The literature review on prosodic units reveals that the conceptualisation of these units remains somewhat ambiguous. Several different terms are used to describe what is the same phenomenon, i.e., intonation unit (Du Bois, 1991; Du Bois et al., 1992/1993; Chafe, 1994; Selting, 1996; Degand & Simon, 2009; Degand et al., 2014; Inbar et al., 2023), intonational phrase (Selting et al., 2009; Biron et al., 2021), intonational group (Cabedo, 2014), tone unit (Degand & Simon, 2009), speech paragraph (Farrús et al., 2016), oral paragraph (Morel & Vladimirska, 2014), elementary discourse unit (Kibrik et al., 2020), prosodic unit (Izre’el et al., 2020; Beňuš, 2021). Nevertheless, there is considerable variation in how researchers define its scope. While some scholars propose a more fine-grained segmentation of speech (Mertens & Simon, 2013; Cabedo, 2014; Degand et al., 2014; Inbar et al., 2023), others adopt a broader perspective (Izre’el & Mettouchi, 2015; Farrús et al., 2016; Beňuš, 2021), leading to a lack of consensus and limited overlap in the delineation of the so-called intonation units, hereinafter prosodic units (PUs).

Du Bois (1991) and Du Bois et al. (1992/1993) defined intonation units as minimal stretches of speech marked by a coherent pitch contour, bounded by prosodic cues such as pausing, pitch reset, and final lengthening. These units function as the basic building blocks of spoken discourse and provide insight into how speakers package information in real time (Du Bois, 1991). A similar approach is taken by Chafe (1994). According to Chafe (1994), the chunks of speech (i.e., intonation units) correspond to chunks of cognitive output by the speaker. As one unit is supposed to carry no more than one idea (Chafe, 1994), it is seemingly structured ideally to enhance the addressee’s comprehension. Moreover, Chafe (1994) wrote that intonation units often fit syntactical units. This would make them relatively easy to identify and delimit. However, idea units can also consist of only a few words or tokens, such as the confirming uh-hum, or it may be suspended in the middle, which hardly constitutes a syntactic unit. Moreover, as Chafe (1994) stated, we can identify prosodic units without speaking the individual language only by relying on acoustic cues. This view is partially shared by Izre’el and Mettouchi (2015), Izre’el (2020), and Ots and Taremaa (2023), however, it remains an interpretive claim that has not been uniformly supported across all languages or contexts. Nevertheless, it suggests that the content of the verbalisation is not a primary prosodic unit deliminator. Therefore, it seems wiser to define prosodic units without regard to the semantic content.

Since prosodic units can mirror idea units, it is not surprising that they are often discussed alongside basic discourse units (cf. Izre’el et al., 2020). Nonetheless, they are not the same entity. While basic discourse units can be deliminated by syntax, speech acts, non-verbal behaviour, and/or prosody (Izre’el et al., 2020), prosodic units cannot be constricted by syntactic rules or the semantic content, but rather prosodic elements (Inbar et al., 2023).

The following section will, therefore, present the most commonly used parameters of prosodic unit boundaries according to the literature overview.

Prosodic Unit Boundaries

Prosodic units are associated most commonly with a coherent or holistic intonational contour which can contain short or somewhat longer pauses (Erman, 1987). This phenomenon is specified further by some authors as the declination of pitch along the unit (Farrús et al., 2016; Morel & Vladimirska, 2014; Biron et al., 2021; Izre’el & Mettouchi, 2015; Izre’el, 2020; Kibrik et al., 2020; Beňuš, 2021; Chafe, 1994) (although this pattern is reserved generally for declarative sentences). Another pitch-related parameter is pitch reset, i.e., the sudden change in pitch (Selting, 1996; Barth-Weingarten, 2013; Degand & Simon, 2009; Steen, 2005; Cabedo, 2014; Degand et al., 2014; Mertens & Simon, 2013; Biron et al., 2021; Izre’el & Mettouchi, 2015; Mithun, 2020; Izre’el, 2020). The less studied parameters include laryngalisation or the infamous vocal fry (creaky voice) (Chafe, 1994; Selting et al., 2009; Barth-Weingarten, 2013; Raso et al., 2020; Izre’el et al., 2020), although it can also be an excluded parameter (e.g., Degand et al., 2014), and intensity variation (loudness), i.e., intensity reset (Chafe, 1994; Farrús et al., 2016; Barth-Weingarten, 2013; Cabedo, 2014; Morel & Vladimirska, 2014; Izre’el & Mettouchi, 2015; Kibrik et al., 2020; Mithun, 2020; Raso et al., 2020; Izre’el et al., 2020), especially in the presence of parenthetical structures (Izre’el, 2020). In terms of tempo, authors note an acceleration at the beginning of the unit or a deceleration at the end of it (Chafe, 1994; Selting, 1996; Selting et al., 2009; Barth-Weingarten, 2013; Biron et al., 2021; Izre’el & Mettouchi, 2015; Izre’el, 2020; Kibrik et al., 2020; Izre’el et al., 2020). Similarly, syllable lengthening is studied as a boundary indicator (Selting, 1996; Selting et al., 2009; Barth-Weingarten, 2013; Degand & Simon, 2009; Cabedo, 2014; Morel & Vladimirska, 2014; Degand et al., 2014; Izre’el & Mettouchi, 2015; Beňuš, 2021). The more obvious boundary parameters are pauses or the absence of speech (Selting, 1996; Farrús et al., 2016; Selting et al., 2009; Barth-Weingarten, 2013; Cabedo, 2014; Morel & Vladimirska, 2014; Degand et al., 2014; Izre’el & Mettouchi, 2015; Beňuš, 2021). Pauses are considered a strong boundary indicator (Barnwell, 2013; Farrús et al., 2016; Cabedo, 2014; Kibrik et al., 2020) and are relatively easy to detect, however, Chafe (1994) cautioned that they can occur within the unit. The length of the pause also seems to correlate with the strength of the boundary (Mithun, 2020; Mertens & Simon, 2013). However, when speakers take turns in spontaneous conversations, speech often overlaps and pauses cannot be detected (Selting, 1996; Barth-Weingarten, 2013). Moreover, according to the literature overview, they are a seldom studied boundary parameter (cf. Inbar et al., 2023; Beňuš, 2021).

Automatic detection of prosodic boundaries examines most commonly the parameters pitch, intensity, duration, syntax, and pause (in that order), whereby pitch is used twice as often as pauses (for a detailed overview of automatic prosodic segmentation, see Biron et al. (2021)). Studies that implement the automatic detection of prosodic units usually specify the absolute thresholds for the individual parameters. For instance, Degand and Simon (2009) defined major prosodic boundaries as a syllable with a 5-semitone higher pitch value than the average value of the neighbouring syllables, or a more than 4-semitone intra-syllabic rise (similarly, Mertens and Simon (2013)). Similarly, pauses are set at an absolute threshold of 100 ms (Izre’el & Mettouchi, 2015), 200 ms (Degand & Simon, 2009; Mertens & Simon, 2013), 250 ms (Degand et al., 2014), or 300 ms (Biron et al., 2021). In contrast, a relative threshold seems to be the norm for syllable elongation, where syllables are supposed to be once (Mertens & Simon, 2013), twice, or three times longer (Degand & Simon, 2009; Mertens & Simon, 2013; Degand et al., 2014).

Among the non-prosodic parameters, syntax (Chafe, 1994), semantic content (Izre’el et al., 2020), as well as certain words can be considered. Here lies the intersection of prosodic units and DMs.

2.2. Discourse Markers and Prosody

In Chafe (1994), words that represent prosodic boundary indicators often coincide with DMs. When constituting their own unit, they form the so-called regulatory intonation units, as they regulate the information flow. With the example of well and mhm, Chafe (1994) illustrated how DMs can form their own unit. The observation that DMs form their own unit is shared by several authors (Carter & McCarthy, 2006; Chafe, 1994; Cuenca, 2013). Others note that they can also be integrated in the prosodic unit, be it at the beginning or at the end (Schiffrin, 1987; Fraser, 1990; Altenberg, 1987; Bazzanella, 2006), nevertheless, the initial position seems to be more frequent (O’Grady, 2017; Romero-Trillo, 2018; Matzen, 2004). Consequently, DMs are used as prosodic unit boundary indicators (Selting et al., 2009; Degand & Simon, 2009; Cabedo, 2014; Morel & Vladimirska, 2014) and are also leveraged for the automatic detection of boundaries (Biron et al., 2021; Mertens & Simon, 2013; Degand et al., 2014).

Despite the canonical ‘prosodically independent’ feature, there are studies that report DMs that are integrated in the middle of the prosodic unit (O’Grady, 2017; Maschler, 2009; Elordieta & Romera, 2002; Majhenič et al., 2022). Rather than discarding these DMs as anomalies, Maschler (2009) studied them as the alternative form of DMs, namely, the non-prototypical DMs. Non-prototypical DMs occur prosodically integrated and are not flanked by pauses. Moreover, studies have also shown that the same DM can take both forms (cf. Majhenič et al., 2022). These types of DMs, therefore, clearly cannot serve as boundary indicators. Based on these insights, relying solely on a predefined list of DMs as indicators of prosodic unit boundaries is not a robust approach, as the same DMs can take both forms (i.e., Majhenič et al., 2022).

Some authors, therefore, decided which DMs cannot constitute boundary indicators. Degand and Simon (2009), Mertens and Simon (2013), and Degand et al. (2014) excluded the hesitation marker euh (also when used for final lengthening). Similarly, Biron et al. (2021) excluded single-unit turns such as oh yeah and um-hum. Although Biron et al. (2021) did not specify the tokens as DMs, their research has shown that the token and is the most common token in the first position. As it occurs at the beginning of units, it is very likely to function as a DM. Moreover, the tokens but, yeah, so, well, and oh, which can all function as DMs, are among the ten most frequent tokens in the first position. A striking observation is that, while Biron et al. (2021) intentionally excluded oh yeah and um-hum from boundary indicators, the token oh is the tenth most frequent token in the first position within the manually annotated material. This finding indicates the importance of not excluding non-lexicalised tokens within the potential boundary tokens (or DMs, for that matter).

2.2.1. Cognitive Discourse Markers in Previous Research

While the token oh from the previous section might seem redundant and irrelevant, it can indeed reveal underlying cognitive processes, offering insight into the speaker’s moment-to-moment processing (Womack et al., 2012). This function is tied uniquely to the so-called cognitive discourse markers.

Although filled pauses (e.g., uh, uhm) are the archetypal C-DMs, they have a communicative role, and one cannot simply dismiss them as notorious fillers. The so-called ‘crutch words’ can help us interpret the thought processes while the person is speaking (Bazzanella, 2006). For instance, the frequent and dense use of such items in spontaneous speech can indicate to the listener that the speaker is having processing issues during their verbalisations (Hieke, 1981), for example, due to syntactically complex structures, or that novel information is about to be uttered (Arnold et al., 2004). Moreover, they occur at more significant frame shifts (Swerts, 1998) and during uncertainty (Brennan & Williams, 1995; Womack et al., 2012). What points to the significance of such C-DMs further was presented by the research of Womack et al. (2012), which revealed a functional difference between uh and uhm, as uhm indicates a greater cognitive effort. Similarly, uhm is more likely to indicate hesitation (Rehbein, 2015; Maschler, 2009).

Cognitive DMs can, therefore, be set apart from the ideational, sequential, rhetorical, or interpersonal ones (cf. Crible & Degand, 2019) in regard to their semantic component. As the previous paragraph illustrates, they are semantically bleached (L. Yang, 2006), which is why it is hard to assign some sort of semantic component to them. As a result, they pose additional difficulties when trying to translate them. Their meaning is usually deduced from the context and the prosody (L. Yang, 2006). For instance, the DM oh can be understood as acknowledgment, surprise, or astonishment, depending on the prosody (L. Yang, 2006). Similar to the use of fillers, speakers do not seem to utter C-DMs intentionally. The absence of deliberate intent seems characteristic of C-DMs. Textual and, to some extent, interpersonal markers are mostly deployed strategically, e.g., to signal turn taking or interpersonal stance, while C-DMs surface at points of greater cognitive effort. Their occurrence in spontaneous speech, however, does not seem to be incidental—while the speaker might not be using them deliberately, the listener does benefit from their use. Recent research has, namely, shown that listeners remember the words that follow uh or uhm better (Fox Tree, 2001).

Most authors distinguish textual and interactive DMs (Maschler, 2009; Collet, 2019), but only a few (Bazzanella, 2006; L. Yang, 2006; Maschler, 2009) add the group of C-DMs. This, however, does not mean that researchers disregard all items we see as C-DMs. Crible and Degand (2019), for instance, included both hesitations and filled pauses, but these items are nested under the sequential DMs and are not functionally classified further. Conversely, Maschler’s (2009) three-pronged DM classification distinguished the textual, the interpersonal, and the cognitive realm. While the main function of textual DMs is to structure discourse, and the main function of the interpersonal is to manage the speaker–hearer relationship, C-DMs operate primarily on the speaker’s internal cognitive level. Cognitive DMs are, therefore, highly metalinguistic, and enable the speakers to comment on their verbalisations, to signal uncertainty, or to indicate the need to reformulate. Maschler (2009) exemplified C-DMs with the Hebrew DM ke’ilu (meaning as if and corresponding roughly to the English like), which lets the listener know that the previous utterance needs to be changed, and markers such as uhm, which reveal momentary hesitation or indecision during formulation. Accordingly, Maschler (2009) specifies the cognitive group further as the C-DMs that pertain to information processing (uh, um-hum), realising new information (oh, aha), and rephrasing (I mean, like).

2.2.2. Interface of Cognitive Markers and Prosodic Boundaries

As the previous section indicates, C-DMs appear without our intention and usually at points of greater cognitive effort. It stands to reason that points of greater cognitive effort coincide with the absence of speech, deceleration, or falling intensity, as we need time to gather our thoughts or are uncertain of what to say next. These phenomena, notably, are the same parameters that delimit prosodic units. Consequently, the occurrence of C-DMs at prosodic unit boundaries, at least in spontaneous speech, seems logical.

3. Methodology

3.1. Materials: Corpus ROG

The present research is based on the audio recordings of the Spoken Slovene corpus ROG (Verdonik et al., 2024). The corpus contains 39,000 tokens, or 5 h of speech. It spans multiparty conversations (e.g., panels, round table COVID-19 discussions, festive openings), dialogues (e.g., radio interviews, everyday conversations between friends, everyday family conversations), and monologues (e.g., cooking recipes, personal reminiscing). All the speakers are native Slovene speakers aged approx. 18 to 90 years old, and the speeches were recorded between 2020 and 2022. The speeches are balanced with respect to gender and geographical region. Some speeches, such as conversations between family and friends, are highly spontaneous, others are semi-spontaneous, such as interviews and panel discussions, while some are prepared (formal openings, interviews).

3.2. Prosodic Unit Annotation

The recordings were transcribed manually in standardised and conversational Slovene. The standardised transcription represents the normative, dictionary-based form of each word and is used primarily for automatic linguistic processing (lemmatisation, part-of-speech tagging, etc.). In contrast, the conversational transcription reflects the actual spoken realisation, including dialectal variation, phonological reduction, and non-standard usage. The speaker turns were annotated. The token segmentation, lemmatisation, and POS annotation were performed automatically. The process began with forced alignment using the Montreal Forced Aligner (McAuliffe et al., 2017), which produced time-aligned segmentation at multiple levels: words, phonemes, and syllables. The alignment process utilised a custom Slovene pronunciation dictionary and an acoustic model trained specifically for Slovene speech. Following segmentation, the texts were processed through a pipeline that included tokenisation, lemmatisation, and part-of-speech tagging. The resulting TextGrid files contained multiple tiers of annotation that served as the foundation for all subsequent parameter calculations.

The annotation of prosodic units was performed with the speech analysis software Praat (Boersma & Weenink, 2019). According to the literature overview, the following prosodic unit boundary parameters were explored: pitch reset, intensity reset, speech rate change, and pauses. All the parameters were computed based on the tiers derived from the initial forced alignment, with the exception of speaker turns, which relied on speaker information obtained from corpus metadata. Pause detection was performed by analysing the word-level segmentation, where any interval with an empty or whitespace-only label was considered a pause. Pitch reset, intensity reset, and speech rate change were evaluated at the syllable level using the syllable segmentation produced during forced alignment. Rather than using fixed threshold values, these parameters were evaluated using relative comparison methods. For pitch reset, a syllable’s mean pitch was compared with the average of its adjacent neighbours. Similarly, intensity reset analysis compared a syllable’s mean intensity with its surrounding context, while speech rate change detection identified significant lengthening of syllables relative to neighbouring syllables. This relative approach ensured that prosodic variations were assessed within their local context rather than against absolute thresholds, accommodating the natural variability in speech. For all four parameters, if the specific criterion was met (i.e., a pitch reset, intensity reset, speech rate change, or pause was detected), the segment was assigned the value POS (positive); otherwise, it received NEG (negative).

Following the machine annotations, human annotators segmented the speeches into prosodic units. Linguistics students were given a few pre-annotated recordings with examples of prosodic units and comments on delimiting prosodic units, such as the pitch and intensity contours and the presence of audible breaths, pauses, decelerations, and accelerations. After this initial phase, they were instructed to identify prosodic units based on the auditory impression and the data from the sonogram. They were allowed to see the automatically evaluated parameters, i.e., if a given criterion is met, however, they were cautioned not to rely on these tags, as the automatic detection might be flawed (Beňuš, 2021) or give false positives. The student annotations were redacted by a linguist with experience in prosodic analysis.

3.3. Discourse Marker Annotation

Discourse markers were annotated in the final stages of our research. Maschler’s (2009) classification was modified and implemented to distinguish between the three main groups: the textual, the interpersonal, and the C-DMs.

Even though Maschler’s three-pronged classification seems different from most classifications, it has parallels with Redeker’s (2000, 2006), Crible’s (2016), and Crible and Degand’s (2019) classification. The ideational structures from Redeker’s (2000, 2006) parallel component model and Crible’s (2016) ideational DMs correspond to Maschler’s (2009) referential markers, nested within the textual realm, as they both describe the semantic relations of the extra-linguistic world. Redeker’s (2000, 2006) sequential structures and Crible’s (2016) and Crible and Degand’s (2019) are parallel to Maschler’s (2009) structural markers, also nested within the textual realm, as both perform the discourse structuring, digressing, and opening or closing function. The most obvious parallels are between Crible’s (2016) interpersonal markers and Maschler’s (2009), as both pertain to the speaker–addressee relation. Redeker’s (2000, 2006) rhetorical structures or Crible’s (2016) and Crible and Degand’s (2019) rhetorical markers pertain to the speaker’s metacomments and always to their subjective relation to verbalisation. These markers were partially represented within Maschler’s (2009) interpersonal markers. Including Maschler’s (2009) cognitive realm, this adaptation brings us to a five-fold classification.

This adapted classification was consolidated further so that Crible’s (2016) ideational, sequential, and rhetorical markers were nested within the textual markers, while the interpersonal and the cognitive realm remained separate. This adaptation reflects the functional distinction between DMs that primarily contribute to discourse-level organisation (textual), those that manage speaker–hearer relations (interpersonal), and those that reflect internal cognitive processing (cognitive). Moreover, as Crible and Degand’s (2019) classification is functionally very fine-grained, we included the individual functions in Maschler’s (2009) corresponding realms, or, in line with Crible and Degand’s (2019) terminology, ‘domains’. Maschler’s (2009) cognitive functions were retained, however, we added Crible and Degand’s (2019) hesitation function (or ‘punctuation’, in their classification, within the sequential markers). In line with Crible and Degand (2019), we allowed for the functions to cross domains, as we suspect that some of the interpersonal functions overlap with the cognitive domain. The adapted annotation scheme is represented in Figure 1.

The annotation of DMs was carried out in several steps. First, a predefined list of potential Slovenian DMs was made, and the tokens that matched the DMs on the list were assigned the tag ‘POS’ (positive) in a new Praat tier. Then, a linguist went through the recordings manually and confirmed if the token was a DM indeed or not and added missing items that were identified as DMs. In the third step, two student linguists classified the DMs, i.e., assigned them a domain and a function. They were first given several pre-annotated Praat files to familiarise themselves with the annotation process. The annotators were then given the instructions to listen to the speech first and then classify the DM. They were encouraged to listen to the recordings as often as they needed to make their annotation choice easier. Each DM was considered in its spoken context, taking into account the surrounding utterances, communicative intentions, and speaker turns. The annotation process involved context-sensitive functional labelling, following an adapted functional annotation drawn from Maschler (2009) and Crible and Degand (2019). Finally, a linguist redacted the annotations.

4. Results

Altogether, 3796 DMs were identified. The majority, i.e., 55%, were represented by C-DMs which amounted to 2086 (e.g., eee ‘uh’, pač ‘like’ or ‘kinda’ or ‘now’ or ‘well’ or ‘just’, ja ‘right’ or ‘yeah’, ne vem ‘dunno’, v bistvu ‘actually’, mislim ‘I mean’, aja ‘oh’). As Figure 2 shows, they were followed by the textual domain, containing 1103 DMs. The textual domain is represented by three domains, which include 497 sequential DMs (e.g., in ‘and’, zdaj ‘now’, no ‘well’ or ‘so’, potem ‘then’ or ‘next’, torej ‘so’, prvič ‘first’, dobro ‘okay’), 394 rhetorical DMs (e.g., ja ‘right’ or ‘yeah’, ampak ‘but’, tako da ‘therefore’, in tako naprej ‘and so on’, se pravi ‘that is’ or ‘so’, skratka ‘in short’, kajti ‘because’), and 212 ideational DMs (e.g., potem ‘then’, ampak ‘but’, vendar ‘however’, pa ‘but’). The smallest group were the interpersonal DMs, which accounted for 601 DMs (e.g., ne ‘right?’, v bistvu ‘actually’, pravzaprav ‘actually’, no ‘well’, veš ‘y’know’, glej ‘look’, zdaj ‘now’).

Figure 3 illustrates the overlapping share per the individual domain and subdomain. Among the DM domains, the textual domain overlapped mostly with prosodic boundaries, as 58% of the DMs overlapped with a prosodic boundary. The cognitive domain was a close second, with a 54% overlapping rate. The least matches were found for the interpersonal domain, where only 41% of the DMs overlapped with the manually annotated prosodic boundaries. Nevertheless, there are great differences among the individual subdomains of the textual domain. While the sequential subdomain had a 71% and the rhetorical subdomain a 68% overlapping rate, the ideational subdomain lagged behind significantly, as merely 12% of its DMs overlapped with prosodic boundaries.

4.1. Automatic Segmentation Parameters

The comparison of the manual and the automatic annotations of prosodic unit boundaries based on pitch reset, intensity reset, speech rate reduction, and pauses revealed that pauses were the parameter that showed the highest match with the manually annotated prosodic boundaries. As shown in Figure 4, the pauses had twice as many matches as the remaining parameters. Pitch and intensity reset showed similar results, albeit only a match of approximately 28%, while speech rate reduction had the smallest match rate, as only 23% of the automatic annotations overlapped with the manually annotated prosodic boundaries. Taken together, the results suggest that, while pauses are a robust indicator of boundary placement, a hybrid approach that integrates multiple acoustic parameters is necessary for more accurate prosodic segmentation, particularly in spontaneous speech.

4.2. Cognitive Discourse Markers

Within the cognitive domain, six functions were identified: hesitation, restructuring, realising new information, hedging, emphasis, and information processing. As is visible from Figure 5, the functions realising new information, emphasis, and restructuring are the ones that co-occurred with prosodic boundaries most often. The functions of hedging and information processing did not demonstrate a clear preference for overlapping with prosodic unit boundaries, since their match rate was approximately 50%. The least overlapping was found for the function of hesitation, where the rate was slightly under 40%. Nonetheless, these data should be viewed in light of the absolute occurrences of the DMs in the individual function, since some functions were disproportionately larger than others, as can be taken from Figure 5. For instance, the information processing function was represented altogether by 1626 DMs, while the emphasis function was the smallest and represented altogether by 13 DMs.

4.2.1. Hesitation

Hesitation markers represent the moment of uncertainty during speech production. They reflect the speaker’s mental effort arising from the need to maintain conversational flow while managing same degree of uncertainty. In our material, several DMs were found to function as hesitation C-DMs. They include: pač ‘like’ or ‘kinda’ or ‘now’ or ‘well’ or ‘just’, pravzaprav ‘actually’, eee ‘uh’, eem ‘uhm’, v bistvu ‘actually’, zdaj ‘now’, kaj jaz vem ‘dunno’, no ‘well’, tako ‘so’ or ‘like’, ne vem ‘dunno’, ma ‘like’ or ‘now’ or ‘well’.

The following example illustrates the use of the Slovene C-DM ne vem ‘dunno’ as a hesitation device co-occurring with a prosodic boundary. The transcript excerpt is separated by the pipe character, which indicates prosodic unit boundaries. The numbers in brackets refer to the DM at hand. The excerpt below is taken from a conversation between two older acquaintances talking about hiking adventures. The speaker is reminiscing about a specific trail with several possible descent options. Trying to recollect one hike, he struggles to remember where they descended. During this effort, he paused for 0.47 s, inhaled audibly, and then, before uttering the third option, he inserted the DM ne vem ‘dunno’. The DM was uttered phonologically reduced and accelerated (see Figure 6), which may suggest a degree of undeliberate use that one links with C-DMs. With the DM (1), he signalled uncertainty to the listener, and the listener could perceive his mental hesitation and, in turn, respond appropriately.

pol smo pa sestopili ponavadi v Vrata ali v Krmo | (1) ne vem a na Pokljuko nazaj … | Enkrat smo šli celo v Kot

‘then we usually descended towards Vrata or Krma | (1) dunno maybe back to Pokljuka … | once we even went to Kot’

4.2.2. Restructuring

Restructuring is perhaps a very straightforward cognitive function. In the material, it is represented by several different DMs, including: eee ‘uh’, mislim ‘I mean’, ne vem ‘dunno’, no ‘well’, pač ‘like’ or ‘kinda’ or ‘now’ or ‘well’ or ‘just’, pravzaprav ‘actually’, se pravi ‘that is’ or ‘so’, tako ‘like’ or ‘so’ or ‘kinda’, torej ‘so’ or ‘therefore’, v bistvu ‘actually’.

The DM mislim (literally, ‘I think’) functions similar to the English I mean when used as a device to reformulate one’s utterance. The example below is taken from a highly spontaneous conversation between two young close friends discussing TV shows. The female friend seems to be disappointed that the most attractive actors are usually gay and seems to conclude her thought as she slows down. Then, however, she suddenly corrects herself and modifies the previous statement, so that she lets her friend know that the fact that they are absolutely unattainable makes them even more appealing to her. This correction is represented fully by the rest of the utterance containing the DM (2), however, the DM mislim signals to the listener that whatever was said before was not fully accurate. It was uttered phonologically reduced and extremely accelerated (see Figure 7), suggesting that the speaker did not plan to use it, which we see as a characteristic of C-DMs. Moreover, it was preceded by a short pause (0.38 s) (which may also serve as a self-repair function by delaying projected talk) containing an audible breath and was uttered significantly louder than the previous unit—characteristic of a prosodic boundary.

zakaj je vsaki drugi hot igralec gej | (2) mislim ne saj to je v bistvu ga dela to še bolj hot | ker je tako nedostopen

‘why is every other hot actor gay | (2) I mean no that’s actually that makes him even hotter | because he is so unattainable’

4.2.3. Realising New Information

The realisation of new information is signalled with multiple different DMs in the corpus ROG, i.e., aha ‘aha’, aja ‘oh’, ja ‘yeah’, ja ja ‘yeah yeah’, a ‘oh’, aaa ‘oh’ or ‘ah’, e ‘uh’, torej ‘so’, se pravi ‘so’ or ‘that is’, mhm ‘um-hum’, mmm ‘um-hum’, čaki ‘wait’ or ‘hold on’, okej ‘okay’. Their use reflects the moment the speaker comes to understand something, a situation similar to the lightbulb moment. The following example illustrates how such a DM can occur at the start of a prosodic unit, not as an independent unit but as a device merged with the rest of the unit.

The example is taken from a conversation between two young acquaintances talking about their musical past, the music lessons they took, and how they experienced them. The first person explains how hard it was to learn to play the accordion, as one cannot simply transfer the knowledge one gained from learning to play the piano. At this very point, the other person seems to reflect on the information that he had to teach himself how to play the accordion without the help of a teacher. This recognition is reflected by the use of the DM (3). It is relatively long in duration (0.15 s) (see Figure 8), which highlights the realisation function further.

A: jaz sem se še to sam nekaj po notah učil pa | mi je šlo počasi, ker je na harmoniki | pač ni to tako kot na klavirju

B: (3) aaa nisi imel učitelja da bi te naučil

A: ja sem imel samo pol sem | on | učitelj je bil iz Avstrije | pa nisem mogel hoditi tolikokrat tja | pa še drago je bilo pa vse

‘A: I was even learning a bit by myself from the sheet music and | it was going slowly because on the accordion | it’s just not the same as on the piano’

‘B: (3) ah you didn’t have a teacher to teach you’

‘A: yeah I did have one but then I | he | the teacher was from Austria | and I couldn’t go there that often | and it was also expensive and everything’

4.2.4. Hedging

The function of mitigation or hedging was recognised within the cognitive domain because it expresses the speaker’s mental state when they perceive some sort of uncertainty or wish to distance themselves from the content due to their processing of alternative interpretations, risks, or level of truth. Compared to the previous function, fewer tokens were found to be used as hedging C-DMs, i.e., v bistvu ‘actually’, pač ‘like’ or ‘kinda’ or ‘now’ or ‘well’ or ‘just’, zdaj ‘now’, pravzaprav ‘actually’, tako ‘so’ or ‘like this’, eem ‘uhm’, no ‘well’.

Example (4) illustrates the hedging function. The excerpt is taken from a self-interview, where a middle-aged male person describes his favourite park. He describes a botanical garden, with all its natural beauties and occasional exhibitions, but adds that it is also a pleasant place to walk around and to spend time with the family. It seems that, for a brief moment, for some reason, he was struggling with how to finish his thought about what else one can do in the park besides taking walks. This is where the brief pause of 0.23 s occurred (see Figure 9). He used the DM (4) before concluding with a generalisation that it is a place to spend some time and just relax, so he does not provide a specific activity. The DM seems to be used as a mitigating device with which the speaker tries not to raise the listener’s expectations.

prijeten park za sprehajati je in | (4) tako za preživljanje | na izi bi lahko rekli | za preživljanje nekega na izi družinskega popoldneva

‘it’s a nice park to take a walk and | (4) like for spending time | to relax one could say | for spending a simple family afternoon’

4.2.5. Emphasis

In this work we recognise the function of emphasis as a cognitive function, as it involves the speaker’s mental processing of assigning importance to certain parts of the utterance that convey their intended meaning the most. With emphasis C-DMs, speakers, knowingly or not, reveal which part of the utterance they prioritise. This function is represented by the following tokens, i.e., ja ‘yeah’ or ‘right’ or ‘well’ or ‘why’ or ‘now’, ma ‘well’ or ‘like’, pa ‘now’ or ‘just’ or ‘after all’ or ‘why’, pač ‘like’ or ‘kinda’ or ‘now’ or ‘well’ or ‘just’, pravzaprav ‘actually’.

The following example is taken from a conversation between two senior female friends reminiscing about the past. The first one is encouraging the other to tell the story of how she met her husband. The other friend talks about how she was going home from work and tried to ignore two people walking towards her. One of them was her friend who, as she walked by, greeted her loudly and emphatically to get her attention (as it turns out, the person next to the friend was her husband). This emphasis is conveyed by the DM ja (5), which is very challenging to translate, as it has several translations besides the literal yes and is extremely void semantically. In this case, it seems to emphasise the speaker’s astonishment of her friends’ reaction. The (5) ja seems merely to express the emphasis, as the pleasant surprise of seeing her friend is already conveyed by the remaining tokens of the utterance. This emphasis is highlighted further by the increase in volume and the brief pause of 0.30 s preceding the DM (see Figure 10), which, in turn, constitute the prosodic boundary.

4.2.6. Information Processing

The DMs uh and uhm are perhaps the most commonly associated DMs when it comes to C-DMs. While they can also function as hesitation and restructuring markers, they are most prominent in the information processing function. Besides uh and uhm the following DMs have also been found to function as information processing C-DMs: in eee ‘and uh’, in em ‘and uhm’, in ‘and’, ja ja ‘right right’, ja ‘yeah’, mmm ‘mmm’, okej ‘okay’, pač ‘like’ or ‘kinda’ or ‘now’ or ‘well’ or ‘just’, pravzaprav ‘actually’, se pravi ‘that is’ or ‘so’, tako da ‘so’, torej ‘so’, čakaj ‘wait’. As C-DMs, they reflect the speaker’s mental effort while speaking. The information processing function is set apart from functions such as hesitating, restructuring, realising new information, emphasising, or hedging, as it pertains only to the instances where none of the mentioned functions fit and only the mental activity is in the foreground. Consequently, it is reserved for cases where the speaker encounters momentary challenges during their speech production.

Such is the example below where a middle-aged male speaker talks about his likes, including his favourite series and vacationing spots. In this excerpt, he explains what makes the series Game of Thrones special. After presenting his arguments, he tries to sum up the ending but seems to struggle a bit. This is indicated by the multiple uhs. The first one (6) is of a longer duration (0.82 s) (see Figure 11), suggesting processing, or maybe planning issues, i.e., the speaker is trying to remember how the series ended. It is followed by acceleration over the next unit, which indicates that the speaker gathered his thoughts and verbalised the rest. The next uh (7) is brief (0.21 s) (see Figure 12), pointing towards a smaller processing issue, perhaps due to the issue of retrieving the correct form of the phrase zmaga nad zlim ‘good triumphs over evil’, while the last one (8) is, again, of longer (0.61 s) duration. As the transcript shows, they all occurred at prosodic unit boundaries. The first (6) and the last (8) uh correspond to prosodic unit-final lengthening, while the second uh (7) is characteristic of the unit-initial acceleration.

na koncu | eee (6) | zgodba se zaključi [tako] da vseeno dobro | (7) eee zmaga nad (8) eee | zlim in zgodba se zaključi

‘so in the end | uh (6) | the story ends [so] that good | (7) uh triumphs over (8) uh | evil and the story ends’

5. Discussion

One of the main characteristics of DMs is their discourse structuring role (Redeker, 2006; Fischer, 2014). Research suggests that they also perform prosodical structuring, as DMs often occur at the beginning or end of prosodic units (Schiffrin, 1987; Fraser, 1990; Altenberg, 1987; Bazzanella, 2006). The present study corroborates this characteristic as, except for the interpersonal domain and the ideational textual subdomain, the majority of DMs overlapped with the manually annotated prosodic boundaries (see Figure 3).

While the prosodic segmenting or ‘sequencing’ role of DMs seems straightforward for sequential (textual) DMs, C-DMs are also a relevant indicator when it comes to the delimitation of prosodic units. As our findings show, the cognitive domain follows the textual domain closely in terms of overlapping with the prosodic boundaries, with an overlap rate of 58% and 54%, respectively. There are, however, significant differences between the individual functions of the cognitive domain. The greatest overlap with the manually annotated prosodic boundaries was found for the function of realising new information, i.e., almost 85% (see Figure 5). These results seem natural, as this function represents the cognitive state when one becomes aware of new information and comprehends its significance, which should occur at the beginning of a unit. The function of emphasis and restructuring had a similar overlap rate of roughly 76%. Restructuring behaves similarly to realising new information, as one suddenly realises that the previous utterance requires modification, making it an ideal candidate to occur at unit onset. Likewise, emphasis represents the burst of energy one feels and expresses once a thought is formed, which is also likely to be verbalised at the beginning of a unit. One would expect hesitation or hedging markers to behave similarly, however, the results do not corroborate this assumption. Hedging was found to match prosodic unit boundaries in only every other case and hesitation in even less, in just over one-third of cases. It, therefore, seems that these functions are often represented by what Maschler (2009) termed non-prototypical DMs. Such is the case for the DM pač ‘like’ or ‘kinda’ or ‘now’ or ‘well’ or ‘just’ which can function both as a hedging or hesitation marker and occurs predominantly prosodic-unit-internally (in 72% of all occurrences). In contrast to the functions of hedging and hesitation, we expected the information processing function to present with a lower overlap rate, as the function is represented primarily by the infamous markers eee ‘uh’ and eem ‘uhm’. These are usually prosodically subtle (e.g., level pitch contour, low intensity) and, thus, not ideal candidates for the unit-initial position. Nevertheless, they reached an overlap rate of 52% (see Figure 5), suggesting that speakers indeed often use the information processing markers at the beginning of the prosodic unit. Besides the uh’s and uhm’s plethora of functions, including hedging (Aijmer, 2002; Tonetti Tübben & Landert, 2022), sarcasm, and irony (Rehbein, 2015; Tottie, 2019), the present study adds its prosodic segmenting role when used as a C-DM.

Although these results highlight the relevance of considering cognitive functions such as realising new information, restructuring, and emphasis as prosodic unit boundary indicators, the following limitations must be considered. The absolute sample sizes of the individual functions differ by orders of magnitude, since the information processing group contains more than a thousand times more cases than the emphasis group, and, partially, the hedging function (see Figure 5). The restructuring function also has a relatively small sample size. Moreover, the corpus leveraged to perform this study contains speeches with varying spontaneity levels, which can affect the use of DMs. This goes particularly for the restructuring, realising new information, and information processing functions. Regarding the information processing C-DM eee ‘uh’, its characteristic level pitch contour makes its role as a prosodic unit boundary indicator challenging. It can be interpreted as an independent prosodic unit, particularly if flanked by pauses (see Section 4.2.6 example 6), or as a unit-initial token if presenting with a more dynamic pitch contour and shorter duration (see Section 4.2.6 example 7). Furthermore, it can present with a mixture of both characteristics (see Section 4.2.6 example 8), which makes it more challenging to annotate as a potential prosodic boundary. With this in mind, it seems reasonable that some studies exclude such items from the list of prosodic boundary indicators (cf. Degand and Simon (2009), Mertens and Simon (2013), Degand et al. (2014)).

In addition to testing C-DMs as signposts of prosodic unit boundary indicators, the present paper examined how the automatically annotated prosodic parameters overlapped with the manually annotated prosodic boundaries. The most often noted parameter of pitch reset did not yield the expected results, as merely approximately a third of the automatic annotations matched the manual boundaries (see Figure 4). A similar, albeit even a bit lower, result was achieved for the intensity reset parameter. The least overlapping was confirmed for the speech rate change parameter. This finding is noteworthy, as the acceleration–deceleration pattern is reported by several authors (e.g., Barth-Weingarten, 2013; Biron et al., 2021; Izre’el & Mettouchi, 2015; Izre’el, 2020; Kibrik et al., 2020; Izre’el et al., 2020) and Biron et al. (2021) report that speech rate increases two-fold at unit onset. Our study found pauses to be the most relevant prosodic unit boundary parameter, as 63% of the automatic annotations overlapped with the manual annotations (see Figure 4). The results are in line with X. Yang et al. (2014), who explored how listeners perceive intonational boundaries in Mandarin Chinese, and Ots and Taremaa (2023), who compared the manual prosodic boundary annotations of native and non-native speakers, and found pauses to be the most salient boundary parameter. However, as this differs from the work by Biron et al. (2021), it must be noted that the present work adopted a different methodological approach. Rather than setting fixed threshold values for pauses, we compared the values of the neighbouring syllables, enabling even brief pauses to be detected. Such an approach is more flexible, since not only do people have different speaking rates, but, within the same recording, a person can change their speaking rate depending on, for instance, their mood or cognitive effort.

6. Conclusions

The present research shows that C-DMs, a traditionally overlooked group of DMs, can indeed serve as signposts of prosodic unit boundaries. Nevertheless, the results show that sequential (e.g., and, now, next) and rhetorical (e.g., right, therefore, and so on) DMs co-occurred with the manually annotated prosodic unit boundaries more frequently. There is, however, great variation among C-DMs in terms of indicating prosodic unit boundaries. The rate at which C-DMs overlap with the manually annotated prosodic unit boundaries depends largely on the function of the C-DM. The highest match rate, 85%, was established for C-DMs in the function of realising new information (e.g., aha, oh, um-hum). The functions of restructuring (e.g., uh, I mean, dunno) and emphasis (e.g., now, well, like) were also found to be relevant boundary indicators, with roughly 76% overlapping. In contrast, hedging, hesitation, and information processing C-DMs did not prove to be reliable boundary indicators. Moreover, the present study examined which automatically annotated prosodic parameter overlapped most frequently with the manual annotations. Pauses were found to be the most relevant parameter, with a 63% match rate, while the parameters of pitch reset, intensity reset, and speech rate change fell significantly behind, with below 30% match rates.

Author Contributions

Conceptualization, S.M.; Methodology, S.M. and J.K.; Formal analysis, S.M. and J.K.; Data curation, S.M. and J.K.; Writing—original draft, S.M. and M.B.; Writing—review & editing, S.M., M.B. and J.K.; Visualization, S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the research project Basic Research for the Development of Spoken Language Resources and Speech Technologies for the Slovenian Language, grant number J7-4642 (funded by the Slovenian Research and Innovation Agency).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We would like to thank the anonymous reviewers for their careful reading and insightful comments, which helped improve the paper. Any remaining errors or oversights are entirely our own.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DM(s)	Discourse marker(s)
C-DM(s)	Cognitive discourse marker(s)

Appendix A

Table A1 presents the list of discourse marker functions used in the annotation process. The function labels (abbreviated) and their descriptions are based on existing functional taxonomies proposed in Crible (2016), Crible and Degand (2019), and Maschler (2009). As outlined in the methodology, functions were not bound to domains; rather, contextual interpretation determined their assignment.

Table A1. Discourse marker functions used in the annotation process.

Abbreviation	Function Label	Description (based on Crible (2016), Crible and Degand (2019), and Maschler (2009))
AD	Addition	Adds discourse-new information, often extending the discourse.
AG	Agreeing and Confirming	Signals speaker agreement or confirmation, either with self or the interlocutor.
CA	Cause	Introduces a causal explanation or reason for the prior segment.
CG	Closing	Closes a discourse unit or interactional segment.
CL	Conclusion	Summarises or draws a conclusion from the preceding discourse.
CM	Comment	Provides a metalinguistic or reflective comment.
CQ	Consequence	Introduces a logical or epistemic result of the prior discourse content.
CT	Contrast	Marks contrast between two ideas or discourse segments.
DG	Disagreement	Expresses dissent or a differing opinion.
EL	Elliptical	Marks an elliptical construction, typically resuming or reframing discourse.
EM	Emphasis	Highlights or intensifies a particular point in the discourse.
EN	Enumeration	Lists or orders items or ideas in sequence.
HD	Hedging	Marks speaker uncertainty, approximation, or lack of full commitment to the proposition.
HS	Hesitation	Signals planning difficulties, pauses, or speaker hesitation.
MC	Maintaining Contact	Manages interactional contact with the interlocutor (e.g., phatic or floor-holding functions).
MO	Monitoring	Indicates speaker control over the discourse, including self-monitoring.
MT	Motivation	Provides a subjective or rhetorical justification for what follows.
OG	Opening	Opens a new discourse segment, topic, or interaction.
OP	Opposition	Presents an opposing viewpoint or contradicts previous discourse.
PI	Processing Information	Indicates ongoing cognitive processing or effort to plan/formulate upcoming discourse.
QT	Quoting	Introduces (pseudo-)reported speech or quotations.
RI	Realising New Information	Signals sudden awareness, recognition, or cognitive insight.
RR	Restructuring	Indicates reformulation, self-repair, or reorganisation of a previous utterance.
SC	Specification	Provides more detailed or specific information (e.g., examples, elaboration).
TE	Temporal	Marks chronological order or progression in discourse.
TS	Topic Shift	Indicates a change in topic, usually initiating a new discourse segment.
TU	Topic Resuming	Returns to a previously suspended topic or ongoing discourse thread.
UC	Encouraging to Continue	Encourages the interlocutor to proceed or keep speaking.

References

Aijmer, K. (2002). English discourse particles. John Benjamins Publishing. [Google Scholar]
Aijmer, K. (2013). Understanding pragmatic markers: A variational pragmatic approach. Edinburgh University Press. [Google Scholar]
Altenberg, B. (1987). Prosodic patterns in spoken English. Studies in the correlation between prosody and grammar for text-to-speech conversion. Lund University Press. [Google Scholar]
Arnold, J. E., Tanenhaus, M. K., Altmann, R. J., & Fagnano, M. (2004). The old and thee, uh, new: Disfluency and reference resolution. Psychological Science, 15(9), 578–582. [Google Scholar] [CrossRef] [PubMed]
Barnwell, B. (2013). Perception of prosodic boundaries by untrained listeners. In Units of talk–Units of action (pp. 125–166). John Benjamins Publishing Company. [Google Scholar]
Barth-Weingarten, D. (2013). From “intonation units” to cesuring—An alternative approach to the prosodic-phonetic structuring of talk-in-interaction. In B. Szczepek Reed, & R. Geoffrey (Eds.), Units of talk–Units of action (Series “studies in language and social interaction”, pp. 91–124). Benjamins. [Google Scholar]
Bazzanella, C. (2006). Discourse markers in Italian: “Compositional” meaning. In K. Fischer (Ed.), Approaches to discourse particles (pp. 449–464). Elsevier. [Google Scholar]
Beňuš, Š. (2021). Investigating spoken English: A practical guide to phonetics and phonology using Praat. Palgrave Macmillan. [Google Scholar]
Biron, T., Baum, D., Freche, D., Matalon, N., Ehrmann, N., Weinreb, E., Biron, D., & Moses, E. (2021). Automatic detection of prosodic boundaries in spontaneous speech. PLoS ONE, 16, e0250969. [Google Scholar] [CrossRef]
Boersma, P., & Weenink, D. (2019). Praat: Doing phonetics by computer. Computer program (Version 6.4.27). Available online: https://www.fon.hum.uva.nl/praat/ (accessed on 14 April 2025).
Brennan, S. E., & Williams, M. (1995). The feeling of another’s knowing: Prosody and filled pauses as cues to listeners about the metacognitive states of speakers. Journal of Memory and Language, 34(3), 383–398. [Google Scholar] [CrossRef]
Cabedo, A. (2014). On the delimitation of discursive units in colloquial Spanish: Val. Es. Co. application model. Discourse segmentation in Romance languages (pp. 157–183). John Benjamins. [Google Scholar]
Carter, R., & McCarthy, M. (2006). Cambridge grammar of English: A comprehensive guide; spoken and written English grammar and usage. Cambridge University Press. [Google Scholar]
Chafe, W. (1994). Discourse, consciousness, and time: The flow and displacement of conscious experience in speaking and writing. University of Chicago Press. [Google Scholar]
Collet, C. (2019). Graham ranger: Discourse markers: An enunciative approach (314p). Palgrave Macmillan. [Google Scholar]
Crible, L. (2016). Towards an operational category of discourse markers: A definition and its model. In A. Sanso, & C. Fedriani (Eds.), Discourse markers, pragmatics markers and modal particles: New perspectives. John Benjamins. [Google Scholar]
Crible, L., & Degand, L. (2019). Domains and functions: A two-dimensional account of discourse markers. Discours. Available online: https://journals.openedition.org/discours/9997 (accessed on 14 April 2025).
Cuenca, M.-J. (2013). The fuzzy boundaries between discourse marking and modal marking. In L. Degand, B. Cornillie, & P. Pietrandrea (Eds.), Discourse markers and modal particles: Categorization and description (pp. 191–216). John Benjamins. [Google Scholar]
Degand, L., & Simon, A. C. (2009). On identifying basic discourse units in speech: Theoretical and empirical issues. Discours. Revue de Linguistique, Psycholinguistique et Informatique. A Journal of Linguistics, Psycholinguistics and Computational Linguistics, (4). [Google Scholar] [CrossRef]
Degand, L., Simon, A. C., Tanguy, N., & Van Damme, T. (2014). Initiating a discourse unit in spoken French. Discourse Segmentation Romance Languages, 250, 243–273. [Google Scholar]
Du Bois, J. W. (1991). Transcription design principles for spoken discourse research. Pragmatics, 1(1), 71–106. [Google Scholar] [CrossRef]
Du Bois, J. W., Schuetze-Coburn, S., Cumming, S., & Paolino, D. (1993). Discourse transcription. University of California. (Original work published 1992). [Google Scholar]
Elordieta, G., & Romera, M. (2002, April 11–13). Prosody and meaning in interaction: The case of the Spanish discourse functional unit entonces “then”. Speech Prosody 2002 (pp. 263–266), Aix-en-Provence, France. [Google Scholar]
Erman, B. (1987). Pragmatic expressions in English. Almqvist & Wiksell. [Google Scholar]
Farrús, M., Lai, C., & Moore, J. (2016, May 31–June 3). Paragraph-based prosodic cues for speech synthesis applications. Speech Prosody (pp. 1143–1147), Boston, MA, USA. [Google Scholar]
Fischer, K. (2014). Discourse Markers. In K. P. Schneider, & A. Barron (Eds.), Pragmatics of discourse (pp. 271–294). Berlin De Gruyter Mouton. [Google Scholar]
Fox Tree, J. E. (2001). Listeners’ uses of um and uh in speech comprehension. Memory & Cognition, 29(2), 320–326. [Google Scholar] [CrossRef]
Fraser, B. (1990). An account of discourse markers. International Review of Pragmatics, 1(2009), 293–320. [Google Scholar] [CrossRef]
Hieke, A. E. (1981). A content-processing view of hesitation phenomena. Language and Speech, 24(2), 147–160. [Google Scholar] [CrossRef]
Inbar, M., Genzer, S., Perry, A., Grossman, E., & Landau, A. N. (2023). Intonation units in spontaneous speech evoke a neural response. Journal of Neuroscience, 43(48), 8189–8200. [Google Scholar] [CrossRef]
Izre’el, S. (2020). The basic unit of spoken language and the interfaces between prosody, discourse and syntax: A view from spontaneous spoken Hebrew. In In search of basic units of spoken language (pp. 77–106). John Benjamins Publishing Company. [Google Scholar]
Izre’el, S., Mello, H., Panunzi, A., & Raso, T. (2020). In search of basic units of spoken language. A corpus-driven approach. John Benjamins Publishing Company. [Google Scholar]
Izre’el, S., & Mettouchi, A. (2015). Representation of speech in CorpAfroAs: Transcriptional Strategies and Prosodic Units. In A. Mettouchi, M. Vanhove, & D. Caubet (Eds.), Corpus-based studies of lesser-described languages: The CorpAfroAs corpus of spoken AfroAsiatic languages (Studies in corpus linguistics, 68, pp. 13–41). Benjamins. [Google Scholar] [CrossRef]
Kibrik, A. A., Korotaev, N. A., & Podlesskaya, V. I. (2020). The Moscow approach to local discourse structure: An application to English. In In search of basic units of spoken language (pp. 367–382). John Benjamins Publishing Company. [Google Scholar]
Majhenič, S., Rojc, M., & Mlakar, I. (2022). Neprototipni diskurzni označevalec zdaj. Slavia Centralis, 15(2), 27–44. Available online: https://journals.um.si/index.php/slaviacentralis/article/view/2446 (accessed on 14 April 2025).
Maschler, Y. (2009). Metalanguage in Interaction. Hebrew discourse markers. John Benjamins Publishing Company. [Google Scholar]
Matzen, L. E. (2004). Discourse markers and prosody: A case study of so. LACUS Forum XXX: Language, Thought and Reality, (30), 75–94. [Google Scholar]
McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., & Sonderegger, M. (2017). Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi. Interspeech, 2017, 498–502. [Google Scholar]
Mertens, P., & Simon, A. C. (2013, September 11–13). Towards automatic detection of prosodic boundaries in spoken French. In Prosody-discourse interface conference 2013 (IDP-2013), Leuven, Belgium (pp. 81–87). FranItalCo. [Google Scholar]
Mithun, M. (2020). Basic units of speech segmentation. In In search of basic units of spoken language (pp. 349–358). John Benjamins Publishing Company. [Google Scholar]
Morel, M. A., & Vladimirska, J. (2014). Intonation and gesture in the segmentation of speech units: The discursive marker vraiment: Integration, focalisation, formulation. In Discourse segmentation in Romance languages (pp. 185–218). John Benjamins Publishing Company. [Google Scholar]
O’Grady, G. N. (2017). “I think” in televised political debate. International Review of Pragmatics, 9(2), 269–303. [Google Scholar] [CrossRef]
Ots, N., & Taremaa, P. (2023). Chunking an unfamiliar language: Results from a perception study of German listeners. In F. Schubö, S. Zerbian, S. Hanne, & I. Wartenburger (Eds.), Prosodic boundary phenomena (pp. 87–117). Language Science Press. [Google Scholar]
Raso, T., Barbosa, P. A., Cavalcante, F. A., & Mittmann, M. M. (2020). Segmentation and analysis of the two English excerpts: The Brazilian team proposal. In In search of basic units of spoken language (pp. 309–326). John Benjamins Publishing Company. [Google Scholar]
Redeker, G. (2000). Coherence and structure in text and discourse. In H. Bunt, & W. Black (Eds.), Abduction, belief and context in dialogue: Studies in computational pragmatics (pp. 233–264). John Benjamins Publishing. [Google Scholar]
Redeker, G. (2006). Discourse markers as attentional cues at discourse transitions. In K. Fischer (Ed.), Approaches to discourse particles (pp. 339–358). Elsevier. [Google Scholar]
Rehbein, I. (2015, June 5). Filled pauses in user-generated content are words with extra-propositional meaning. In Second workshop on extra-propositional aspects of meaning in computational semantics (ExProM 2015), Denver, Colorado (pp. 12–21). Association for Computational Linguistics. [Google Scholar]
Romero-Trillo, J. (2018). Prosodic modeling and position analysis of pragmatic markers in English conversation. Corpus Linguistics and Linguistic Theory, 14(1), 169–195. [Google Scholar] [CrossRef]
Schiffrin, D. (1987). Discourse markers. Cambridge University Press. [Google Scholar]
Selting, M. (1996). On the interplay of syntax and prosody in the constitution of turn-constructional units and turns in conversation. Pragmatics, 6(3), 371–388. [Google Scholar] [CrossRef]
Selting, M., Auer, P., Barth-Weingarten, D., Bergmann, J. R., Bergmann, P., Birkner, K., Couper-Kuhlen, E., Deppermann, A., Gilles, P., Günthner, S., Hartung, M., Kern, F., Mertzlufft, C., Meyer, C., Morek, M., Oberzaucher, F., Peters, J., Quasthoff, U., Schütte, W., … Uhmann, S. (2009). Gesprächsanalytisches transkriptionssystem 2 (GAT 2). Online-Zeitschrift zur verbalen Interaktion, 10, 353–402. [Google Scholar]
Steen, G. (2005). Basic discourse acts: Towards a psychological theory of discourse segmentation. In F. Ruiz de Mendoza Ibanez, & M. Sandra Pena Cervel (Eds.), Cognitive linguistics: Internal dynamics and interdisciplinary interaction (pp. 283–312). Mouton de Gruyter. [Google Scholar]
Swerts, M. (1998). Filled pauses as markers of discourse structure. Journal of Pragmatics, 30(4), 485–496. [Google Scholar] [CrossRef]
Tonetti Tübben, I., & Landert, D. (2022). Uh and Um as pragmatic markers in dialogues: A Contrastive perspective on the functions of planners in fiction and conversation. Contrastive Pragmatics, 4(2), 350–381. [Google Scholar] [CrossRef]
Tottie, G. (2019). From pause to word: Uh, um and er in written American English. English Language and Linguistics, 23(1), 105–130. [Google Scholar] [CrossRef]
Verdonik, D., Ljubešić, N., Rupnik, P., Dobrovoljc, K., & Čibej, J. (2024, September 19–20). Izbor in urejanje gradiv za učni korpus govorjene slovenščine ROG. In S. Arhar Holdt, & T. Erjavec (Eds.), Language technologies and digital humanities: Proceedings of the conference, Ljubljana, Slovenia (1st ed., pp. 469–484). Institute of Contemporary History. ISBN 978-961-7104-40-0. [Google Scholar] [CrossRef]
Wellmann, C., Holzgrefe-Lang, J., Truckenbrodt, H., Wartenburger, I., & Höhle, B. (2023). Developmental changes in prosodic boundary cue perception in German-learning infants. In F. Schubö, S. Zerbian, S. Hanne, & I. Wartenburger (Eds.), Prosodic boundary phenomena (pp. 119–156). Language Science Press. [Google Scholar]
Womack, K., McCoy, W., Alm, C. O., Calvelli, C., Pelz, J. B., Shi, P., & Haake, A. (2012, July 13). Disfluencies as extra-propositional indicators of cognitive processing. Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics (pp. 1–9), Jeju, Republic of Korea. [Google Scholar]
Yang, L. (2006). Integrating prosodic and contextual cues in the interpretation of discourse markers. In K. Fischer (Ed.), Approaches to discourse particles (pp. 265–298). Elsevier. [Google Scholar]
Yang, X., Shen, X., Li, W., & Yang, Y. (2014). How Listeners Weight Acoustic Cues to Intonational Phrase Boundaries. PLoS ONE, 9(7), e102166. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Adapted DM classification.

Figure 2. Representation of DMs by domain and subdomain.

Figure 3. Overlapping share by DM domain and subdomain.

Figure 4. Overlapping share by prosodic unit boundary parameter.

Figure 5. Overlapping share by cognitive domain functions.

Figure 6. Praat screenshot with the highlighted C−DM ne vem ‘dunno’ in the hesitation function, the pitch contour in blue, the intensity contour in green, and the individual tiers with its values.

Figure 7. Praat screenshot with the highlighted C−DM mislim ‘I mean’ in the restructuring function, the pitch contour in blue, the intensity contour in green, and the individual tiers with its values.

Figure 8. Praat screenshot with the highlighted C−DM aaa ‘ah’ in the realising new information function, the pitch contour in blue, the intensity contour in green, and the individual tiers with its values.

Figure 9. Praat screenshot with the highlighted C−DM tako ‘like’ in the hedging function, the pitch contour in blue, the intensity contour in green, and the individual tiers with its values.

Figure 10. Praat screenshot with the highlighted C−DM ja ‘why’ in the hedging function, the pitch contour in blue, the intensity contour in green, and the individual tiers with its values.

Figure 11. Praat screenshot with the highlighted C−DM eee ‘uh’ in the information processing function, the pitch contour in blue, the intensity contour in green, and the individual tiers with its values.

Figure 12. Praat screenshot with the highlighted C−DMs e ‘uh’ in the information processing function, the pitch contour in blue, the intensity contour in green, and the individual tiers with its values.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Majhenič, S.; Beras, M.; Križaj, J. The Neglected Group: Cognitive Discourse Markers as Signposts of Prosodic Unit Boundaries. Languages 2025, 10, 159. https://doi.org/10.3390/languages10070159

AMA Style

Majhenič S, Beras M, Križaj J. The Neglected Group: Cognitive Discourse Markers as Signposts of Prosodic Unit Boundaries. Languages. 2025; 10(7):159. https://doi.org/10.3390/languages10070159

Chicago/Turabian Style

Majhenič, Simona, Mitja Beras, and Janez Križaj. 2025. "The Neglected Group: Cognitive Discourse Markers as Signposts of Prosodic Unit Boundaries" Languages 10, no. 7: 159. https://doi.org/10.3390/languages10070159

APA Style

Majhenič, S., Beras, M., & Križaj, J. (2025). The Neglected Group: Cognitive Discourse Markers as Signposts of Prosodic Unit Boundaries. Languages, 10(7), 159. https://doi.org/10.3390/languages10070159

Article Menu

The Neglected Group: Cognitive Discourse Markers as Signposts of Prosodic Unit Boundaries

Abstract

1. Introduction

2. Theoretical Background

2.1. Prosodic Units

Prosodic Unit Boundaries

2.2. Discourse Markers and Prosody

2.2.1. Cognitive Discourse Markers in Previous Research

2.2.2. Interface of Cognitive Markers and Prosodic Boundaries

3. Methodology

3.1. Materials: Corpus ROG

3.2. Prosodic Unit Annotation

3.3. Discourse Marker Annotation

4. Results

4.1. Automatic Segmentation Parameters

4.2. Cognitive Discourse Markers

4.2.1. Hesitation

4.2.2. Restructuring

4.2.3. Realising New Information

4.2.4. Hedging

4.2.5. Emphasis

4.2.6. Information Processing

5. Discussion

6. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI