Childhood apraxia of speech (CAS) is a motor speech disorder that affects the planning and programming of speech (
ASHA, 2007;
Shriberg et al., 2012). The primary symptoms of CAS include inconsistent speech sound errors, speech segmentation, and errors in prosody (
ASHA, 2007;
McNeil et al., 1997;
Murray et al., 2015b), all of which can impact the intelligibility and naturalness of speech. Among prosodic errors, lexical stress has been identified as a critical area of difficulty. Children with CAS exhibit equal and/or excess stress on syllables or incorrectly shifting stress onto nearby syllables (
Odell & Shriberg, 2001;
Shriberg et al., 2011). However, prosody encompasses more than lexical stress alone—it operates across multiple levels of the prosodic hierarchy, from the syllable to the intonational phrase (
Nespor & Vogel, 2007;
Selkirk, 1980). While lexical stress patterns have been extensively studied, little research has examined how children with CAS produce broader prosodic patterns, such as intonation, or how these interact with other timing-based features, such as segmentation. Understanding these interactions provides insight into how speech motor planning deficits manifest across prosodic levels and may ultimately aid in refining differential diagnosis and treatment of CAS. Thus, the central aim of this study is to explore how intonation, and more specifically declination, relates to segmentation in CAS, and how these features may differ between children with CAS and typically developing (TD) peers.
1.1. Childhood Apraxia of Speech and Motor Programming
Numerous studies have investigated the diagnostic features of CAS, showing that they share similarities with apraxia of speech (AOS) but with distinct developmental manifestations, particularly in lexical and phrasal stress (
Kent & Rosenbek, 1983;
Maas et al., 2008;
Odell & Shriberg, 2001;
Seddoh et al., 1996;
Shriberg et al., 1997). Models of motor programming describe CAS as involving breakdowns in the processes that encode an abstract linguistic (phonological) message into a temporally organized sequence of motor commands (
Maas et al., 2008). These breakdowns result in increased inter-segment durations between speech sounds and syllables (i.e., segmentation), distortions of speech sounds, and dysprosody (
ASHA, 2007;
Seddoh et al., 1996). According to the Directions Into Velocities of Articulators (DIVA) model, such deficits reflect impaired feedforward and feedback control mechanisms that govern both segmental timing and suprasegmental modulation (e.g., pitch, lexical stress) (
Guenther, 2016). Thus, prosodic and timing disruptions in CAS may share a common underlying planning deficit, linking lower-level temporal segmentation with higher-level intonational control.
Most studies use perceptual and/or acoustic characteristics for identifying and diagnosing CAS/AOS, including the examination of syllable segmentation and how feedback from word production affects motor output (
Ballard et al., 2014).
Murray et al. (
2015b) showed that syllable segmentation contributed to the differentiation of children with CAS from those with other speech sound disorders when evaluated alongside other diagnostic features. Similarly,
Iuzzini-Seigel et al. (
2017) found that fifteen of twenty school-aged participants with CAS exhibited syllable segmentation during the Goldman-Fristoe Test of Articulation (GFTA-2) (
Goldman, 2000), whereas only three of ten participants with speech delay exhibited this feature. However, syllable segmentation alone is not fully diagnostic of CAS and may be susceptible to diagnostic circularity when classification includes syllable segmentation, as children are sometimes classified based partly on its presence. Relatedly,
Ballard et al. (
2016) reported that individuals with AOS have considerable difficulty with multisyllabic words due to the increased motor programming load. In terms of lexical stress, Shriberg and colleagues have identified equal/excess stress as a diagnostic marker of CAS, distinguishing it from speech delay using the Lexical Stress Ratio (LSR) metric (
Shriberg et al., 2003). Together, these findings suggest that both timing and prosodic features are clinically informative, though their diagnostic specificity remains debated. Unlike prior studies that emphasize identifying diagnostic markers, the present study instead examines how two prosodic features—segmentation and declination—relate to each other in CAS to understand their shared motor planning basis.
1.2. Treatments and Theoretical Grounding
Treatment programs for CAS target these underlying motor planning deficits to improve speech fluency and prosodic control. The Rapid Syllable Transition Treatment (ReST) program (
Ballard & Robin, 2021) uses multisyllabic pseudo-words to strengthen motor programming and lexical stress accuracy, while the Nuffield Dyspraxia Programme 3rd Edition (NDP3) (
Williams & Stephens, 2004) employs a structured hierarchy progressing from individual phonemes to multisyllabic real words to increase articulatory accuracy and consistency. In the first randomized controlled trial comparing these two treatments,
Murray et al. (
2015a) found that both approaches demonstrated improvements from pre- to post-treatment. Although ReST appeared to maintain gains more effectively at the one-month follow up,
Morgan et al. (
2018) reanalyzed the data, excluding the potentially biased 4-month post-treatment data, and cautioned that treatment effects could not be interpreted in terms of treatment efficacy due to the absence of a no-treatment control group. Therefore, while both ReST and NDP3 show promising clinical potential, causal claims regarding their effectiveness require cautious interpretation until additional controlled studies are available. These treatment outcomes are consistent with theoretical accounts that attribute speech disruptions in CAS to impaired motor planning and programming processes rather than phonological deficits alone (
Ballard et al., 2018;
Maas et al., 2008).
Building on these approaches, Treating Establishment of Motor Program Organization (TEMPO
SM) combines pseudo-words and real words to target distortions, segmentation, and lexical stress (
Miller et al., 2021). Based on the principles of motor control, TEMPO
SM stimuli consist of three- to four-syllable sequences with stop plosives or fricatives and strong-weak or weak-strong stress patterns (e.g., pseudo-words:
TAgibu,
taGIbu; real words:
CUcumber,
poTAto). These stimuli were used as baseline probes prior to treatment, offering controlled materials for assessing timing and stress coordination.
Miller et al. (
2021) reported improvements in speech sound accuracy, reduced segmentation, and better lexical stress marking in children with CAS following TEMPO
SM. Since the present study analyzes pre-treatment recordings from that dataset, these stimuli provide a consistent foundation to evaluate baseline differences in timing and intonation between children with CAS and TD peers.
Other studies have also investigated lexical stress in CAS, measuring the fundamental frequency (f0, or perceived pitch), duration, and intensity patterns across syllables (
Arciuli & Ballard, 2017;
Ballard et al., 2010,
2012;
Littlejohn & Maas, 2025;
Skinder et al., 2000).
Littlejohn and Maas (
2025) further demonstrated that children with CAS exhibit reduced control of stress timing and pitch contrast during multisyllabic word production, underscoring prosodic planning difficulties.
Shriberg et al. (
2003) introduced the Lexical Stress Ratio (LSR), an acoustic index used to quantify strong-weak stress contrasts and showed that children with CAS produce reduced stress contrast compared to TD peers. More recent work by
Kopera and Grigos (
2020) found that children with CAS showed reduced kinematic and acoustic contrasts between strong and weak syllables during an imitation task, even when segmental accuracy was adequate, suggesting that prosodic errors are not solely the by-product of speech sound distortions. Cross-linguistic work has also shown prosodic disturbances. Related work in Cantonese-speaking preschoolers with CAS shows difficulty with f0 variation during tone-sequencing tasks (
Wong et al., 2021,
2024).
Wong et al. (
2024) found that this population produced reduced f0 contrasts in comparison to TD children and children with non-CAS speech sound disorder. These findings indicate that prosodic impairment in CAS may reflect limitations in higher-level speech planning rather than articulation alone. However, nearly all empirical research on prosody in CAS has focused on word-level stress, leaving a gap in understanding of how children with CAS produce phrase-level prosody, including intonation across utterances.
1.3. Segmentation
Segmentation refers to inappropriate lengthening and pausing between syllables or words, commonly seen in CAS and AOS. In adults, longer inter-segment intervals are reported for AOS but not for individuals with conduction aphasia or TD speakers (
Seddoh et al., 1996). Mechanistically, such segmentation is AOS has been linked to deficits in spatiotemporal and kinesthetic aspects of speech motor programming (
Ballard et al., 2014,
2018;
Duffy, 2013). Although
Seddoh et al. (
1996) examined adults with acquired impairment, similar hypotheses of disrupted motor planning have been proposed to account for segmentation behaviors in children with CAS (
Maas et al., 2008;
Wright et al., 2009). Unlike adults with AOS, children with CAS are still developing phonological and prosodic representations, and therefore segmentation in CAS may reflect both motor programming deficits and immature linguistic planning.
Wright et al. (
2009) further proposed that segmentation reflects a working memory “loading delay” when linking syllables together. They described two underlying processes known as integration (INT) and sequencing (SEQ). INT preprograms shorter speech units in advance, while SEQ manages the serial ordering and execution of these preplanned units (
Maas et al., 2008;
Wright et al., 2009). INT has higher processing demands for multisyllabic words, which helps explain the increased number of segmentation errors observed in longer and more complex utterances (
Seddoh et al., 1996;
Wright et al., 2009). This account is conceptually consistent with
Seddoh et al. (
1996), who interpreted excessive pausing in AOS as reflecting difficulty transitioning between motor plans. In both accounts, transition difficulty is not a separate behavior from segmentation, but rather an explanation for why longer inter-segment durations occur.
Although theories differ in whether segmentation arises primarily from motor planning constraints (
Maas et al., 2008), limited working memory capacity (
Wright et al., 2009), or impaired feedforward control (
Ballard et al., 2018;
Miller & Guenther, 2021;
Terband et al., 2009), these accounts converge on the idea that segmentation reflects reduced fluency in speech planning. The present study does not attempt to distinguish among these mechanisms but uses segmentation as an established behavioral marker associated with impaired speech timing in CAS.
A growing body of research has focused on developing ways to detect and quantify segmentation.
Shriberg et al. (
2017a,
2017b) introduced the Pause Marker (PM) as an acoustic-aided perceptual feature to differentiate CAS from speech delay. The PM is defined as an inappropriate between-word pause of at least 150 milliseconds (ms) and has shown strong utility in characterizing one of the core timing-based features of CAS. Longer and more frequent pausing contributes to reduced fluency and naturalness, affecting intelligibility as well as overall prosody.
Importantly, segmentation has implications beyond timing alone. Continued investigation is needed to determine whether children with CAS plan speech one syllable at a time—as segmentation would suggest—or whether they plan larger units but experience breakdowns during execution (
Ladd, 2008;
Wright et al., 2009). This raises a key theoretical question: do prolonged inter-segment intervals interfere with prosodic organization at higher levels of the prosodic hierarchy (
Nespor & Vogel, 2007;
Selkirk, 1980)? Since segmentation disrupts temporal coordination within an utterance, it may also affect suprasegmental features such as intonation, potentially altering how pitch contours like declination unfold across speech. Examining how segmentation relates to intonation in CAS therefore provides a window into how temporal and prosodic planning interact in this disorder.
1.4. Intonation
Prosody is the melody and rhythm of speech expressed through suprasegmental features such as pitch, duration, and loudness. According to the prosodic hierarchy (
Nespor & Vogel, 2007;
Selkirk, 1980), prosody is organized across levels ranging from the syllable to the utterance. From bottom to top, these levels include the syllable, foot, prosodic word, phonological phrase, intonational phrase, and utterance. The present study focuses on two of these levels, segmentation at the syllable level and intonation at the intonational phrase level, to determine how timing-based disruptions may influence higher-level pitch organization in CAS. Although segmentation could be described at the level of the foot when it occurs between syllables in a word, here it is operationalized at the syllable level to align with previous work in CAS that defines segmentation as prolonged pauses either within or between words (
Shriberg et al., 2017a,
2017b).
Lexical stress, the primary marker of word-level prominence in English, has been well studied in CAS and is widely documented as a diagnostic feature that distinguishes CAS from TD children (
Arciuli & Ballard, 2017;
Murray et al., 2015b;
Odell & Shriberg, 2001). For example, lexical stress contrasts strong-weak and weak-strong stress patterns that can discriminate between certain noun/verb pairs in English (
REcordnoun vs.
reCORDverb) (
Ladd, 2008). At higher prosodic levels, accentuation marks informational focus, while intonation organizes pitch movements across phrases and utterances to convey pragmatic, syntactic, and discourse-level meaning (
Shattuck-Hufnagel & Turk, 1996). Acoustically, these changes are reflected in f0, duration, intensity, and segmental quality (
Shattuck-Hufnagel & Turk, 1996). Despite increased research attention to lexical stress in CAS, little is known about whether phrase-level prosody—particularly intonation—is also affected.
A key component of intonation is declination, defined as the overall downward trend of f0 across an utterance, also referred to as
downtrend or
downdrift (
Cohen & ‘t Hart, 1967;
Connell & Ladd, 1990). Although declination can be described across different prosodic levels, it is most robustly observed at the intonational level, where it is aligned with natural pause structure (
Dankovičová, 1999;
Hirschberg & Pierrehumbert, 1986). Declination contributes to global prosodic organization by signaling the progression and completion of an intonational phrase (
Ladd, 1984). Competing explanations have been offered regarding its origin: physiological accounts attribute declination to a gradual decrease in subglottal pressure and vocal fold tension over the course of an utterance (
Collier, 1975;
Lieberman, 1967), while linguistic accounts argue that declination is under speaker’s control and serves communicative functions (
Ohala, 1978). Some theories integrate both views and propose that global declination patterns interact with localized pitch movements (highs and lows) to convey semantic emphasis or discourse structure (
Calhoun, 2010;
Cole, 2015;
Fujisaki, 2004).
Declination interacts with pitch reset (or f0 reset), a rise in f0 that occurs at the onset of a new intonational phrase following a pause (
Oliveira, 2003). In English, evidence from connected speech suggests that speakers adjust their initial f0 based on anticipated utterance length, producing steeper declination slopes in shorter utterances and shallower slopes in longer utterances (
Yuan & Liberman, 2014). These findings suggest that aspects of intonation, including declination and pitch reset, may be planned in advance rather than emerging solely from physiological decline (
Prieto et al., 2006). In speech models, this global pitch planning is thought to reflect prosodic planning processes that operate before motor execution (
Levelt, 1993).
Typical development research shows that children are sensitive to prosodic structure early in life and use prosodic cues in the input to segment speech and identify phrase boundaries (
Jusczyk, 1999;
Jusczyk et al., 1993). However, little is known about how intonation develops in populations with motor speech disorders such as CAS. In acquired disorders, declination is often disrupted. For example,
Marotta et al. (
2008) found that Italian speakers with Broca’s aphasia exhibited disrupted declination due to excessive pausing, syllable lengthening, and abnormal f0 resetting. Similarly,
Ryalls (
1982) found reduced f0 range across utterances in French speakers with Broca’s aphasia. While these findings are informative and demonstrate that disruptions to timing and fluency can affect global pitch organization in connected speech, they reflect adult disorders and may not generalize to children, who are still developing both phonological and prosodic representations.
Limited research has been conducted for developmental disorders on declination and f0 reset and their relationship to utterance lengthening and segmentation. Only one known study used acoustic measurements to analyze f0 reset in 5- to 13-year-old speakers with language delay and (what was then referred to as) high-functioning autism (LD-HFA), Asperger’s syndrome (AS), and TD speakers (
Peppé, 2007). Results showed a greater degree of f0 reset for the LD-HFA group than the TD peers, but no significant differences form the AS group. This was thought to be partially due to an increased pitch span in the LD-HFA group, signaling what was described as ‘exaggerated’ prosody. However, no published studies have examined f0 declination patterns in children with CAS, despite well-documented timing disruptions and stress errors in this population (
Maas et al., 2008;
Wright et al., 2009). Given that segmentation disrupts temporal cohesion within an utterance, it may interfere with the implementation of global intonational patterns such as declination, which are typically associated with prosodic phrasing and utterance-finality marking (
Hirschberg & Pierrehumbert, 1986).
Previous work has identified differences in both timing control (segmentation) and prosodic structure (lexical stress errors) in CAS (
Maas et al., 2008;
Wright et al., 2009). What remains unclear is whether these disruptions are restricted to syllable-level timing or whether they scale upward to affect higher-order prosodic planning. Theories of speech production differ in their assumptions about the scope of planning. Some models propose look-ahead planning over only a few words (
Levelt, 1993), while others allow for more global organization across an entire utterance (
Kent & Rosenbek, 1983). In this framework, prosodic planning refers to the organization of f0 movements and prominence patterns before speech execution, whereas motor programming involves generating the articulatory commands to implement that plan (
Guenther, 2016;
Levelt, 1993). For children with CAS, who present with impairments in prearticulatory planning, the question arises as to whether they plan intonation globally (like TD speakers) or whether segmentation reflects a breakdown in motor programming that limits forward organization of pitch. Early models propose that segmental and prosodic processes may initially operate independently but interact during online speech planning (
Kent & McNeil, 1987;
Levelt, 1993;
Odell & Shriberg, 2001). Examining declination in CAS therefore provides a test of whether higher-level pitch structure is preserved despite timing disruptions and offers insight into how prosody is represented and controlled in this disorder. This motivation leads to the present study, which examines whether declination differs between children with CAS and TD peers and whether it is influenced by segmentation during connected speech.
1.5. Current Study
In summary, understanding the relationship between declination and segmentation offers insight into online motor programming and the extent to which prosodic planning is preserved in children with CAS. Rather than re-establishing known diagnostic characteristics of CAS, this study focuses on how two prosodic features—segmentation and declination—interact within the same utterances to clarify whether timing disruptions influence higher-level pitch organization.
Building on the dataset from
Miller et al. (
2021), the current study provides an acoustic comparison of intonation and segmentation features with CAS and TD children before the children with CAS received the TEMPO
SM treatment. While segmentation and prosodic abnormalities are part of the perceptual profile of CAS, the goal of this study is not to confirm their presence but to quantify the relationship between these features. By examining how segmentation (syllable-level timing) relates to declination (intonational-level f0 organization), this study tests whether timing disruptions in CAS scale upward to affect global prosodic structure. Although this study includes CAS and TD groups, future work should include children with other speech disorders to determine whether observed patterns are specific to CAS.
Recent work suggests that prosodic difficulties in CAS extend beyond lexical stress.
Kopera and Grigos (
2020) observed reduced strong-weak contrast in children with CAS even when segmental accuracy was preserved, indicating disruption in prosodic planning. Similarly,
Wong et al. (
2024) reported reduced f0 modulation in Cantonese-speaking children with CAS, suggesting limited pitch control across languages. Together, these findings motivate investigation into whether children with CAS show disruptions not only at the word level but also in phrase-level prosody, including declination.
The aims of this study are three-fold: (1) analyze segmentation, measured by inter-segment durations between and within words, in children with CAS compared to TD children; (2) compare the degree to which the two groups exhibit declination across declarative utterances; and (3) examine whether segmentation and declination are related in either group.
First, it is hypothesized that segmentation will be a strong differentiating variable between these two populations. Previous work has found increased inter-segment duration between syllables of multisyllabic words produced by children with CAS (
Seddoh et al., 1996;
Wright et al., 2009). Based on this literature, we predict that children with CAS will present with significantly longer inter-segment durations between and within words compared to TD children. Additionally, due to the structure of the words utilized from TEMPO
SM, we hypothesize differences in the degree of segmentation based on lexical status (nonce vs. real) and primary stress location (first vs. second syllable). Real words are expected to produce shorter pauses since they are practiced more often and have faster lexical retrieval, resulting in less disruption (
Guenther, 2016;
Levelt, 1993). Words with second-syllable lexical stress are also expected to show greater segmentation due to increased planning demands partially due to the coordination of prior unstressed syllables (
Cutler & Carter, 1987;
Guenther, 2016;
Maas et al., 2008).
Second, we predict that children with CAS will demonstrate reduced declination compared to TD children, reflected by flatter f0 slopes. If children with CAS engage in reduced prosodic preplanning of utterance-level pitch contours, consistent with speech motor control accounts of reduced feedforward control (
Guenther, 2016), they may be less able to produce global f0 decline. Previous studies have documented f0 modulation differences in CAS (
Wong et al., 2024), but declination has not yet been explored.
Third, we hypothesize that there will be a relationship between declination and segmentation. Specifically, longer inter-segment durations are expected to disrupt the continuity of f0 decline, potentially increasing opportunities for f0 reset and reducing the magnitude of declination across the utterance. Alternatively, if declination remains stable even in the presence of segmentation, this may suggest that prosodic planning is preserved despite timing disruptions. Testing the relationship between segmentation and declination allows us to examine whether temporal and prosodic control operate independently or interactively in CAS. Understanding whether timing and prosody interact in CAS may also inform future intervention approaches targeting both segmentation and prosodic control.