Previous Article in Journal
A Description of Hobart English Monophthongs: Vowel and Voice Quality
Previous Article in Special Issue
When Pitch Falls Short: Reinforcing Prosodic Boundaries to Signal Focus in Japanese
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The C/D Model and the Effect of Prosodic Structure on Articulation

by
Donna Mae Erickson
Haskins Laboratories, New Haven, CT 06825, USA
Languages 2025, 10(12), 298; https://doi.org/10.3390/languages10120298 (registering DOI)
Submission received: 4 September 2025 / Revised: 21 November 2025 / Accepted: 26 November 2025 / Published: 30 November 2025
(This article belongs to the Special Issue Research on Articulation and Prosodic Structure)

Abstract

The Converter/Distributor (C/D) model, as proposed by Fujimura is theoretically grounded on articulatory observations of X-ray microbeam (XRMB) data that show that utterance syllable prominence patterns “dictate” the size, timing and phrasing of articulatory movements. This paper briefly addresses some key differences between the C/D model and Articulatory Prosody (AP) before moving on to describe some of the basic components of the C/D model, e.g., the phonological prosodic input to the model, the Converter, which outputs, among other things, descriptions of syllable prominence patterns, prosodic boundaries, and syllable edge features, and the Distributor which enlists “elemental gestures” to articulatorily implement feature sets. Examples from previous research inspired by the C/D model illustrate how articulatory events, i.e., patterns of jaw lowering, account for the temporal organization of spoken language; also, how second language speakers tend to carry over their first language patterns of jaw lowering. Some applications of the C/D model are discussed, including first and second language acquisition, clinical applications, and new insights into prosodic phonology. The final section summarizes some of the strengths of the C/D model as well as the yet-to-be investigated aspects of the model.

1. Introduction

Studies of speech articulation based on measured kinematic data, e.g., X-Ray Microbeam (XRMB), Electromagnetic Articulography (EMA), ultrasound, palatograms, MRI, are relatively new, at least compared with acoustic studies of spoken language. The first two, XRMB (e.g., Fujimura et al., 1973; J. R. Westbury et al., 1994) and EMA (e.g., https://www.de/), involve tracking pellets/sensors on the lips, tongue, and lower incisor tooth (to track mandible movement) to see how the various articulators move while speaking; these methods have been especially relevant for developing articulatory models of speech organization, e.g., Articulatory Phonology (AP) (Browman & Goldstein, 1992), the DIVA model (Directions Into Velocities of Articulators) (Guenther, 1994), the Fujimura C/D model (Fujimura, 2000; Erickson, 2024a; Fujimura & Williams, 2015) and the Geppeto Model (Perrier, 2014) Other more recently introduced models include the XT/3C Model (Turk & Shattuck-Hufnagel, 2020; Turk et al., 2025), the Segmental Articulatory Phonetics Model (Svensson Lundmark, 2023, 2025; Svensson Lundmark & Erickson, 2024) and the Articulatory Prosody Model (Erickson & Niebuhr, 2023; Erickson, 2024b).
The goal of this paper is to focus on the C/D model, a model that has not received as much attention as other articulatory models. Thus, this paper can be seen as a tutorial of some of the basic tenets of the C/D model; it also includes examples from articulatory studies illustrating some of the tenets, specifically, the importance of the mandible in producing syllables as well as its role in organizing strings of syllables to output a spoken utterance. The merit of the C/D model as an articulatory model is that it can account for the effects of prosody on utterance-temporal organization of speech articulation in a way that is not accountable by other models (e.g., Section 2.2.1). However, it is a complicated, not yet fully experimentally tested model. A basic tenet, as mentioned above, is that the mandible plays an essential role in producing syllables and in organizing syllables in phrasal units.
The C/D model, as proposed by Fujimura (e.g., Fujimura, 2000), is theoretically grounded on articulatory observations of X-ray microbeam (XRMB) data that show that utterance syllable prominence patterns, as manifested by the amount of jaw lowering per syllable, “dictate” the size, timing and phrasing of articulatory movements. In this regard, the C/D model stands out from the other acoustic and articulatory models which purport that prosodic information is “suprasegmental,”, e.g., Lehiste (1970). To amplify, often speech is analyzed as a series of consonant and vowel segments, with prosodic (“non-segmental”) information, e. g., duration, intensity, F0, added onto (above) the segmental information. This contrasts with the C/D model point of view in which phoneme segments per se do not exist; rather, the articulatorily defined syllable is the concatenative unit, generated by the phonologic prosodic specifications of the utterance, e.g., Section 2.2.1. Thus, the model posits prosody is the underpinning of articulation. In this way, the C/D model is unique. To date, no other articulatory model starts with prosody, i.e., has phonological/prosodic information as its input. One of the goals of this paper is to encourage further experimental studies to examine and document the importance of prosodic structure on articulatory events.
Some caveats are mentioned: our interpretations of the model are based on working within the model’s framework, and as such, there will be simplifications (e.g., Section 2.2.1). Moreover, an underlying assumption of the model is that strength of articulation rather than timing of articulation is the governing principle of temporal organization of speech; thus, there is no attempt in the current version of the model to connect articulation to specific measurable time points in the acoustic signal.
The C/D model was first proposed by Osamu Fujimura in 1991, as a paper presented at the 12th International Congress of Phonetic Sciences, entitled “Prosodic effects on articulatory gestures—A model of temporal organization” (Fujimura et al., 1991). In subsequent publications, he continued to elaborate and revise the C/D model (Fujimura, 1994, 2000, 2002, 2008). (See also publications in Erickson & Imaizumi, 2015; Erickson & Kawahara, 2015; also, Erickson, 2024a). The C/D model was to explain the temporal organization of speech, based on intensive examination of articulatory patterns observed in X-Ray Microbeam articulatory data. Trained as a physicist, Fujimura proposed “a model to tackle a complex system that has aspects of discrete—symbolic—information processing and physical movement as well as sound production at the end” (Reiner Wilhelms Tricario:pc).
In order to highlight some of the novel points of the C/D model with regard to prosody, presented here is a brief comparison with AP, which is currently the most widely used model by linguists to describe speech articulation. Both AP and the C/D model describe articulatory kinematics of speech. However, their approach is radically different in terms of the underlying (timing) framework. The framework for AP is a coupled oscillator model (Task dynamics (TD), e.g., Saltzman and Munhall (1989), inspired by work on coordinated arm motion by Saltzman and Kelso (1987). We briefly mention here that application of TD to speech articulation remains an open question: arm movement is a jointed system in contrast to speech articulation. The tongue is a soft-tissue articulator; while the jaw joint has reduced degrees of freedom compared to the ball and socket joints of the arms.
AP describes articulatory gestures as the smallest phonological units; these are organized sequentially as second-order differential equations of the TD-coupled oscillator model to account for the production of consonant and vowel sequences. In the original interpretation of AP, timing of gestural onsets is coordinated as being either in-phase or anti-phase within the TD framework; the timing of the gestures is tightly connected with their acoustic output of a series of consonants and vowels that make up the spoken utterance. In the traditional AP model, there is no acknowledgment per se of prosody; prosody, especially prominence, is a by-product of how the gestures coordinate. In later versions of AP, e.g., Saltzman et al. (2008), in order to account for a suprasegmental account of prosody, e.g., phrasing in an utterance, propose gestural planning and modulation oscillators. Also see Byrd and Krivokapi’c (2021) for handling prosody in AP with timing modulating gestures.
As for the C/D model, the framework is the abstract phonological/prosodic structure, as represented by an augmented metrical tree specifying syllable stress levels (i.e., syllable magnitudes) (e.g., Fujimura, 2000); the smallest phonological unit is the syllable. In this sense, timing is relative to the other syllable members of the utterance; there is no mention of timing in terms of absolute measurable durations of syllable units, nor of segments within a syllable, e.g., C–V timing relationships. In the C/D model, the magnitude of the syllable affects the magnitude/strength of the “segmental” articulators, and also, very importantly, the strength of the various boundaries (e.g., word and phrase boundaries) in the spoken utterance. Thus, in the C/D model, strength of articulation is the organizing principle; it is this strength that affects the timing of syllable units.
In contrast to the C/D model, gestures are the phonological units in APs; each gesture involves sets of articulators working together to produce the desired place and degree of oral constriction for an acoustic segment (e.g., the LIP gesture, which involves LP (lip protrusion) and LA (lip aperture) works with the VEL gesture (velic aperture) and the GLO (glottal aperture) to produce a bilabial nasal constriction, i.e., /m/.) The AP model has no gesture for the mandible (jaw) per se. The kinematics of the other gestures, by default, incorporate the jaw position for each acoustic segment.
In the C/D model, instead of gestures there are ballistic movements which are referred to as impulse response functions (IRFs); these trigger the appropriate articulators to produce the onset and coda portions of the syllable. Instead of a vowel gesture, the vocalic portion is described in terms of tongue horizontal and vertical positions. The phonological unit in the C/D model is the syllable; the syllable articulator is the jaw which provides a skeleton framework describing the prominence patterns of the utterance.
Thus, another crucial difference between the C/D model and AP is the role of the syllable. In the C/D model, the syllable organizes the prosodic phonological structure of an utterance, in that the magnitude, i.e., prominence, of the syllable is commensurate with the magnitude of the “segmental” articulators and also the magnitude of the various boundary pulses. The CD model is the only model that has prosody, and specifically, the syllable as implemented by jaw lowering, as the underlying framework of temporal organization of speech, e.g., stress patterns, articulatory strength and boundary strength.
The organization of this paper is as follows: Section 2 describes the basic components of the C/D Model; Section 3 presents a summary of how the C/D model observes and interprets articulatory events to account for temporal organization of spoken language; Section 4 addresses applications of the C/D Model’s approach to prosody-- transference of first language prosody patterns to second languages; first language acquisition, mandible patterns and neural nesting; clinical applications, specifically, stuttering and Parkinson’s disease; and new insights into prosodic phonology; Section 5 describes a new tool for investigating jaw movement; Section 6, entitled “Now what?”, summarizes strengths of the C/D model and brings up further yet-to-be investigated aspects of the C/D model.

2. Components of the C/D Model

The model is called the Converter/Distributor Model (C/D Model) because it takes the abstract prosodic and phonological information as its input, which is subsequently Converted to strings of syllables; then the prosodic and phonological information is Distributed to articulatory movements, which are implemented by control function/signal generators.
The C/D model with its many component levels is complicated. In the following sections, I try to present the basics of the model. The model starts with the phonological prosodic input to spoken utterances, in terms of metrical phonological information (Section 2.1). Section 2.2 describes the role the mandible plays in implementing this information. This section also details an articulatory experiment illustrating how the jaw is both the syllabic and prosodic articulator and how syllable prominence/jaw lowering determines the location and size of phrase boundaries. Section 2.2 and Section 2.3 show how the phonological prosodic information affects strength of articulation of the syllable, not only the syllable nucleus but also the onset and coda. Syllable strength in the C/D model is referred to as “syllable magnitude,” represented in the model by syllable pulses whose height represents the magnitude of the syllable nucleus, as well as the magnitude of the syllable edge articulations. Syllable edge articulations are referred to as Impulse Response Functions (IRFs) which describe the feature specifications for the syllable onsets and codas. The IRFs specify sets of place, manner and voicing features, i.e., place of constriction in the vocal tract, degree of constriction in the vocal tract, and voicing of the vocal folds during constriction (discussed in more detail in Section 2.2.3). These features are then implemented by specific articulators by means of neural commands to the appropriate muscles to move the articulators; the strengths of the neural commands/muscle movements are specified by the phonological prosodic input to the utterance.
The C/D model challenges us to view spoken utterances in a different light—as patterns of syllable prominences articulated by varying degrees of jaw lowering. The reader is urged to take on this challenge as they read through the next sections. Note: the following explication of the model is couched in terms of prosodic organization of English utterances; however, the model theoretically is applicable to all languages.

2.1. Prosodic Input of the C/D Model

As aforementioned, for the C/D model, prosodic organization is the driver of articulatory speech kinematics. The term “prosodic organization” here refers to stress/prominence patterns, as described by metrical trees (e.g., Liberman & Prince, 1977). A simplified approach to metrical trees is the observation that syllables in an utterance are chunked into smaller phrase units (e.g., foot, phrase (also known as accent phrase or intermediate phrase), and utterance). Within each phrasal unit, one word has more prominence than the others in that phrasal unit. The amount of prominence in each phrase unit is reciprocal to the size of the unit, such that the amount of prominence on the prominent word in the utterance is the largest of all the prominences; this prominent word is often referred to as broad focus or nuclear stress. This pattern of prominences can be described in terms of a metrical tree, showing branches of strong-weak (s-w) syllables, with syllables with the most s-assignments having the largest prominence and those with the least s-assignments having the weakest prominences. In this way, the metrical grid generates numerical values of syllable prominence within an utterance.
In the C/D model, the metrical trees generate the utterance skeleton, a series of pulses, the height of which represents the strength/magnitude of the syllable, both the syllable nucleus and the syllable edges. Fundamental frequency (F0) patterns are part of the base function (see Section 2.2.3 for more discussion). Thus, the framework of the C/D model is such that (1) the syllable is the basic unit of speech, (2) the “syllable magnitude” (syllable prominence) is a product of the metrical organization of the utterance, (3) increased prominence is implemented by increased articulatory strength, and (4) increased articulatory strength also yields larger phrase breaks within utterances. This aspect of the C/D model, that increased articulatory strength affects phrase boundaries, is illustrated in Section 2.2.1.
A diagram of the input to the C/D model is shown in Figure 1 (from Fujimura et al., 1991). The prosodic phonological input for the utterance That’s wonderful is described in terms of a metrical tree plus utterance parameters with numeric controls, e.g., speed, formality, excitement, dialect, speaker age, specified by the small letters on the left of the figure. The strong-weak branches of a metrical tree, along the lines of Liberman and Prince (1977), describe the arrangement of syllable magnitudes. The beginning and end of the utterance is marked by $, and the phrase break after that’s is marked with %.

2.2. Converter Component of the C/D Model

The Converter takes all the information in the input and outputs a base function, which includes, among other things, the utterance skeleton. The skeleton describes the prominence and phrasing patterns (morpheme, word, and phrase boundaries) of the prosodic input of the C/D model.

2.2.1. The Skeleton

Converting Syllable Prominence Values Syllable Pulse Heights
The Converter takes the prominence value of each syllable, together with the other utterance parameters as specified by the metrical tree and converts these prominence values to syllable pulses. The height of each pulse represents the magnitude (prominence) of each syllable. An utterance is thus defined as having a “skeleton” consisting of a series of syllable pulses of varying heights representing various syllable magnitudes. In the C/D model, the syllable articulator is the mandible, also known as the jaw. The working hypothesis is that the syllable magnitude is to a first approximation commensurate with the amount of jaw lowering for each syllable as measured from the occlusal plane. This hypothesis is substantiated by the following experimental research findings.
Review of Experimental Findings Supporting the Converter Component
First, that jaw lowering increases when a word (syllable) is emphasized/focused has been well documented in English (e.g., Kent & Netsell, 1971; Stone, 1981; Macchi, 1995, 1998; Summers, 1987; J. Westbury & Fujimura, 1989; Beckman & Edwards, 1994; de Jong, 1995; Erickson, 1998a, 1998b, 2002, 2003; Harrington et al., 2000; Menezes, 2003, 2004) as well in other languages, e.g., French (Loevenbruck, 1999, 2000; Tabain, 2003) and Japanese (Erickson et al., 2000). These studies examined emphasis on words containing low /ɑ/ vowels; increased jaw lowering is also reported for emphasized high and mid vowels, e.g., Erickson (2002, 2003), Harrington et al. (2000).
Second, that jaw lowering also correlates with syllable stress/prominence levels has been reported by, e.g., Erickson (2004), Erickson et al. (2012, 2015, in press), Erickson and Niebuhr (2023), Menezes (2003, 2004), Svensson Lundmark and Erickson (2024). These studies confirm a strong connection between a syllable’s stress level and the amount of jaw lowering (jaw displacement) relative to the occlusal plane for that syllable. An example of this in English is shown in Figure 2. The bottom panel of Figure 2 shows jaw tracings from electromagnetic articulographic (EMA) recordings of an American English speaker producing “five bright highlights in the sky tonight”, taken from the longer utterance, “Yes, I saw five bright highlights in the sky tonight.” Notice that the jaw opens and closes for each monosyllabic content word, each of which contains the same phonological vowel /ɑɪ/, yet notice that the amount of jaw displacement varies for each syllable. The arrows point to the monosyllabic words with the most jaw displacement: the largest jaw displacement is on sky, the next on high(lights), and the next on five. As shown in the metrical grid in the top panel, this pattern of jaw displacement correlates with the prominence/stress values of each of the syllables, with sky having nuclear stress, high having phrasal stress, and five, foot stress. Regression analyses, as reported in Erickson et al. (2012), show a significant correlation between the amount of jaw displacement and the syllable stress patterns as shown in the stress level row of the metrical grid. Note about metrical grids: similarly to metrical trees, metrical grids also show hierarchical prominence patterns. Metrical grids are easier to draw than metrical trees. In Figure 2, the prominence values for each word are calculated by assigning a prominence to a syllable, then another one for word, another one for foot, another one for phrase and, finally, one for utterance. Adding up the number of filled-in squares yields the numerical prominence value of a specific syllable in the utterance. Thus, the largest prominence value for this speaker for this utterance, i.e., the nuclear stress word (broad focus), was on sky. For more discussion of nuclear stress and jaw displacement patterns, see Erickson et al. (2012), Erickson and Niebuhr (2023). For more information about metrical grids, see Selkirk (1982) and Hayes (1995).
The words in Figure 2 all contain the same phonological vowel in order to not introduce jaw height as a complicating factor, given that jaw displacement is greatest for low vowels and least for high vowels (Menezes & Erickson, 2013; Williams et al., 2013). A pilot study normalizing the amount of jaw displacement across vowel heights shows that jaw lowering correlates with syllable stress, thus supporting the C/D Model hypothesis that the amount of jaw displacement is commensurate with syllable magnitude.
As concerns emphasis/focus, jaw displacement increases even more (see, e.g., Erickson, 2004; Erickson et al., 2015). As reported by Svensson Lundmark et al. (2023), narrow focus on a normally produced weak syllable (w) will increase mandible lowering on that syllable such that the syllable is now a strong syllable (s) and mandible lowering on the next syllable is reduced. The result is that the utterance prominence pattern in terms of weak (w) and strong (s) syllables is changed. They reported that the ws wS type sentence spoken with broad focus “The fat cat sat with Matt,” where both cat and Matt are strong syllables (cat has phrasal stress and Matt, nuclear stress) has a prominence pattern of ws wS, where “s” indicates more jaw lowering and “S” indicates the most jaw lowering. When focus was put on fat, the jaw lowered more, and the prominence pattern changed in the first phrase from ws to sw.
Thus, experimental studies support the C/D model hypothesis that the amount of jaw lowering per syllable is commensurate with the prominence of that syllable. The converter component of the C/D model generates a series of syllable pulses, where the height of each represents the syllable magnitude.
Converting Location (Timing) of Syllable Pulses
The objective of this section is to show how the C/D model observes and interprets articulatory events to account for temporal organization of spoken language, not in terms of durational timing between articulation and acoustics, but in terms of rhythmic organization of syllable prominences and subsequent syllable boundaries. The rough details of how this is done is described in this section. First, however, in order to calculate the timing of the syllable pulse, it is necessary to assess the magnitude of each syllable, i.e., the prominence value of each syllable, as outputted by the metrical information and implemented by jaw displacement commensurate with the magnitude value of each syllable. Hence, in the above section jaw displacement was discussed as a function of prominence. Now, we turn toward timing of syllables and boundaries.
According to the C/D model, the height of the syllable pulse is at first approximation based on the amount the jaw lowers below the occlusal plane for that syllable. The timing of the syllable pulse, however, is NOT at the point where the jaw is maximally low. This is an important aspect of the model. The timing of the pulse within the syllable is determined by the velocity of the crucial articulators (CA) of the onset and coda. (Note: actually, Fujimura referred to “iceberg points”, which are discussed in more detail in Section 6).
What follows are reports of experimental applications of some of the principles of the C/D model. The experiment was reported in a number of earlier publications (e.g., Erickson et al., 2015; Kim et al., 2015; Erickson & Kawahara, 2015; Erickson, 2024a). The application to the C/D model was first outlined as an invited lecture entitled, “Converter/Distributor model: for describing spoken language rhythm,” presented at the ABRALIN conference, 31 October 2023 in Curitiba, Brazil. Later this was written up as a short dictionary entry in Speech Sciences (Erickson, 2024a). Permission has been obtained to include parts of this entry in this manuscript.
The articulatory experiment involved the sentence Pam said bat that fat cat at that mat, spoken by 2 speakers where they varied the position of emphasis in the utterance, i.e., on. bat, that, fat, cat, mat. The sentences were presented on a PowerPoint display, and the speakers were asked to emphasize the word in bold letters. Figure 3 shows articulatory tracings of the segmental articulators (referred to in the C/D model as Crucial Articulators TD, TT, LL) and the syllable articulator (mandible/jaw) for the utterance Pam said bat that fat cat at the mat, where bat is emphasized. The vowels in this utterance are all /ae/ vowels, except for /ɛ/ in said, yet each syllable shows a different amount of mandible lowering (i.e., jaw displacement). Based on the amount of mandible lowering (from the occlusal bite plane) for each syllable in the utterance, a string of syllable pulses is created. The articulatory data shown in Figure 3 is position data (vertical dimension) of the Crucial Articulators (CAs) for the syllable onset and coda for each of the monosyllabic words. For instance, the CAs for Pam, the initial word of the utterance, are the Lower Lip (LL) for both the syllable onset and coda, for the emphasized word BAT, the crucial articulator for the syllable onset is the LL and for the coda, the Tongue Tip (TT). For the syllable onset of that, the CA is the Tongue Dorsum (TD). The position data for the jaw is shown in the bottom panel. As described in the section above, the converter creates a string of syllable pulses, whose heights are commensurate with the stress pattern, as articulated by the syllable articulator, the jaw.
For the utterance shown in Figure 3, auditory impressions indicate that the speaker correctly produced emphasis on bat but also added prominence to fat. Acoustically, both bat and fat have increased duration and increased intensity compared to the other words in the utterance, with bat longer than fat by 0.032 s while fat is louder than bat by 1.3 rms. Both bat and fat were produced with pitch accents; bat with an H*+L and fat with an L*+H, and fat having a higher maximum f0 than bat by 24.8 Hz. As for jaw displacement, both bat and fat have more jaw lowering than the other words in the utterance, but the jaw lowers more for bat than fat by 2.71 mm. The C/D model in its current yet not fully developed form focuses on articulation of syllables and how syllables relate to adjoining syllables; it does not address acoustic characteristics per se. For a more in-depth discussion about acoustic and articulatory cues as they relate to perception of prominence, the reader is referred to Erickson et al. (in press).
As seen in Figure 3, the bottom panel shows the jaw tracings. According to the C/D model hypothesis, the amount of jaw opening per syllable represents the numerical amount of prominence per syllable. However, the location of the syllable pulse is NOT the point in time when the jaw is maximally low. In order to position the syllable pulses in the utterance, the Converter creates a time location for each syllable as the midpoint between the maximum velocities of the syllable onset and coda CAs. Note that the C/D model referred to “iceberg points” instead of maximum velocity times. With regard to a difference between “iceberg points” and “maximum velocity points”, the “iceberg” threshold is an optimal point of relative invariance of velocity, which differs from the peak velocity (Bonaventura, 2003). But for simplicity, we use maximum velocity points. A discussion about “iceberg points” can be found in Fujimura (1986, 2000), and Bonaventura and Fujimura (2007); for a comparison of iceberg points with maximum velocity points, see Kim et al. (2015).
Figure 4 is like Figure 3, except that it also includes velocity information of the syllable onset and coda CAs, necessary to locate the syllable pulse within the syllable.
The yellow vertical lines on each side of the syllable mark the point in time of maximum velocities of the onset and coda CA, e.g., the red arrows marking the maximum velocity of the LL articulator for the onset of bat and fat and the blue arrows marking the maximum velocity of the TT articulator for the coda of bat and fat. The white lines in the center between the yellow maximum velocity lines mark the point in time where the syllable pulses occur. Notice that sometimes the syllable pulse coincides with the maximum jaw lowering, but when the syllable is emphasized (bat) or has more stress (fat), the syllable pulse occurs before the maximum jaw displacement. For a report on timing of maximum jaw displacement relative to onset of vowel as a function of emphasis, see Erickson et al. (2024a).
The indirect by-product of the syllable pulse positioned at the midpoint between syllable onset and coda CA velocities is boundary strength information. As can be observed from Figure 4, the yellow lines marking the CA max velocities do not overlap. The distances between the contiguous yellow lines are related to the distance between syllables, i.e., the syllable boundaries. How the Converter calculates abstract syllable durations is described in the next section.
Converting of Syllable Boundaries via Syllable Triangles
Thus, the timing of the syllable pulses is at the midpoint between the maximum velocities of the CAs, while the heights are related to the amount of jaw lowering for each syllable. To calculate abstract syllable durations, the following process is used.
The apex of the syllable triangle is the height of the syllable pulse. The angle of each isosceles triangle is determined by the hypothesis that no two sides of the syllable triangles may overlap, with only one set of triangle sides touching. A Matlab algorithm for constructing syllable triangles can be found in Erickson et al. (2015). Figure 5 shows the results of the algorithm for calculating syllable triangles and syllable boundaries for the utterance, Pam said bat that fat cat at the mat. Based on the algorithm, the two syllable triangles that touch are cat and at. Since the other syllable triangles all have the same angle as these two syllables, the result is (a) each syllable has its own abstract syllable duration and (b) each syllable is accompanied by an abstract boundary duration. The magnitude of each syllable is represented by the height of the syllable pulse, and the magnitude of the boundaries, by the distance between each syllable triangle. In this utterance, emphasized bat has the largest syllable pulse, which is expected, since it was the emphasized word. This is followed by fat. The largest syllable boundary, according to the output of the algorithm, follows that, the next largest follows fat. Tentative confirmation of this approach can be found in the pilot study by Erickson et al. (2015); they reported that listeners’ perceptions of prominence and boundaries show a significant correlation with the syllable (jaw displacement) and boundary magnitudes generated by the Converter’s syllable triangle algorithm.
By applying “reverse engineering,” we can construct a possible metical tree for this utterance, as shown in Figure 6. Note that although the speaker was instructed to place emphasis on bat, he also put more prominence on fat. In doing so, it seems he separated the utterance into two major phrases. The tentative results presented here offer further support of the Converter’s approach to generating syllable and boundary magnitudes from articulatory kinematics.
The above discussion is offered as an example of how theoretically the C/D model calculates syllable boundaries. What is intriguing to me is that the syllable-triangle/iceberg method of the C/D model outputs a metrical arrangement of syllable relationships akin to that of the prosodic phonological input for the utterance. In effect, the C/D model provides a reverse-engineering approach to recovering the phonologic prosodic input to a spoken utterance. To what extent it provides realistic information about temporal organization of spoken utterances needs further testing.

2.2.2. The Base

The base function includes the skeleton, as well as the syllable features and the melody (F0 patterns). An overview of the Base function can be seen in Figure 7, which provides a detailed look at how the CONVERTER handles a single syllable, e.g., /kit/. The various components of the Converter are displayed in four levels/rows.
The top level is the syllable magnitude information, the second level is the syllable features specification level, the third level describes tongue movement for the vocalic information, and the last level, the voicing information. There is also a fifth level to account for f0 movements, i.e., the melody, which will be discussed later.
A brief review of the top level, as shown in Figure 7: The top level is the syllable magnitude information, which according to the model is commensurate with the amount the mandible lowers for making this syllable. The pulse also includes information about the vowel nucleus, which is implemented in the third level specifying tongue advancement. The black vertical line in the top panel indicates the height of the syllable pulse, based on the theoretical prominence value of the single syllable utterance, the dashed diagonal lines on either side are the syllable triangle lines. The two edges of the triangle indicate the start in time of the (abstract) syllable onset and coda. (Note that the isosceles triangle is a modification of the original thinking presented in Fujimura et al., 1991). On the left side of the syllable base, the blue line indicates the magnitude of the syllable onset pulse; the purple line to the right of the syllable, the magnitude of the syllable coda pulse. Notice that the onset and coda pulses are the same height (magnitude) as the syllable pulse. This is an important tenet of the C/D model; it implies that the kinematic strengths of the syllable pulse and onset/coda pulses are the same, i.e., a syllable with large prominence is produced with increased jaw displacement together with increased CA strength.

2.2.3. Syllable Features for Describing Syllable Onset, Nucleus, and Coda

The phonological information is not specified in terms of consonant and vowel segments; it is formatted in terms of feature sets: place, manner and voicing. ‘Place’ refers to where the constriction in the vocal tract occurs; ‘manner’ refers to nature of the constriction, e.g., complete, partial; and voicing refers to vocal fold adduction, which is handled in the fourth level of the Base function, according to Figure 7.
Impulse Response Functions (IRFs)
The second layer in Figure 7 shows the Impulse Response Functions (IRFs) which are triggered by the onset and coda pulses. The IRFs consist of feature sets. In this case, the initial IRF set is indicated by {K, τ} to specify a velar place (K), and stop manner (τ) syllable onset, while the final IFR feature set is indicated by {T, τ} to indicate an apical place (T) and stop manner (τ) as the syllable coda. (See Fujimura, 1994 for a description of features and their symbols). The IRFs generate a response curve, the dashed blue and purple curved lines; note that the peak of the slope does not align with the IRF pulse, and the onset of the curve starts before and ends after the pulse. The strengths of the IRFs are dictated by the magnitude of the onset and coda pulses, which are the same magnitude as that of the syllable pulse. The bold blue and purple horizontal lines for the syllable onset and coda indicate the duration of the closure period of the articulators for producing velar K and apical T, respectively. Notice the closure for the onset starts before the onset pulse and ends right at the coda pulse. Presumably, this is meant to describe the asymmetric patterns between the onset and coda response curves.
The third level is the vocalic level. The underlying base of the syllable is from the blue onset pulse to the purple coda pulse, marked by dashed red horizontal lines. Since the vocalic syllable pulse (red upward arrow) generates a tongue advancement which starts before, and ends after, the onset and coda pulses, the surface duration extends beyond the base duration.
The last level is the voicing level. This level specifies laryngeal adduction for the voicing feature (which is not marked in the feature specifications if the syllable margin IRF is voiceless). It is triggered by the magnitude of the onset and coda pulses, and the IRF pulses. The horizontal dashed blue line indicates the (abstract) duration of syllable voicing; the green curve indicates the surface laryngeal adduction curve, which, again, starts before, and ends after, the onset and coda pulses. The voiced portion of the syllable is indicated by the solid green horizontal bar, which starts at the green dashed vertical line marked “on” and ends with the green vertical dashed line marked “off”. As for the closure part of the stop, it starts with the green laryngeal adduction curve and ends at the blue vertical line. The aspiration period is the distance between the end of the stop and the beginning of the voicing for the vowel, that is, VOT is displayed as the discrepancy between articulatory release of the stop constriction and voice onset of the vowel. As the magnitude of the syllable pulse/onset pulse affects the strength of syllable margin features (e.g., voicing), it follows that syllable magnitude also affects VOT (see Matsui, 2017).
The voice quality component of the C/D model is not yet developed and therefore is not shown in Figure 7. F0 is described as part of voice quality, which along with other types of voice qualities, “may play crucial roles in prosodic control” (Fujimura, 2008, p. 316), including the intonation contours, i.e., melody. The concept of F0 as part of voice quality opens the door to thinking of F0 as more than just an F0 contour displayed in a spectrogram, but rather part of the complicated source-filter interactions involved in producing different voice qualities (see, e.g., Obert et al., 2023). However, this part of the model has not yet been developed. As for describing Japanese pitch-accents, a Fujisaki-type model was proposed in Fujimura (2008).

2.3. Distributor, Actuators and Signal Generator

The specifications in the CONVERTER are fed to the DISTRIBUTOR which selects “elemental gestures” to be enlisted to implement the feature sets. Then a multidimensional set of ACTUATORS assembles the stored feature sets of the Impulse Response Functions and sends these to CONTROL FUNCTION/SIGNAL GENERATORS. Although these parts of the model are yet to be implemented, the ultimate goal is for all the component parts of the model to work together to output acoustic signals of an utterance.

3. Summary of How the C/D Model Observes and Interprets Articulatory Events to Account for Temporal Organization of Spoken Language

The C/D model proposes a novel way for handling the effect of syllable prominence on articulation; this approach has been substantiated by pilot articulatory studies by Erickson and colleagues. Specifically, these studies support the C/D model’s conversion of prosodic patterns into a skeleton of syllable pulses representing the syllable magnitude patterns in an utterance; they also lend support to the positioning of the pulses in each syllable halfway between the maximum velocities of the onset and coda CAs; and they encourage future investigation into how abstract syllable durations and boundaries are calculated via isosceles syllable triangles.
The C/D model uniquely proposes that syllable boundaries are derivatives of syllabic articulation strengths, i.e., jaw and onset/coda crucial articulators. More studies with more data are needed to explore this hypothesis. Is there indeed a correlation between syllable magnitude (amount of jaw displacement) and magnitude of crucial articulators? In the current version of the C/D model, syllable magnitude is measured in terms of the maximum amount of jaw displacement during the syllable. How is the magnitude of the onset and coda Crucial Articulators best measured? One way might be in terms of duration of the consonant (see, e.g., McGuire et al. (2024) which reports initial consonants of stressed syllables are longer than unstressed ones and jaw displacement is also greater). Another suggestion proposed by Svensson Lundmark (2024) would be to measure magnitude of CAs in terms of magnitude of acceleration.
As mentioned in the introduction, the concept of timing in the C/D model is not in terms of durational relationships, but rather in terms of magnitude relationships. In contrast to other articulatory models, such as AP, intrasyllabic timing is not discussed per se in the C/D model. Nevertheless, exploring the timing of articulatory events relative to acoustic events within the framework of the C/D model needs to be performed. Christopher Geissler, in this same issue, examines the timing between intrasyllabic kinematic units within the framework of a gestural coupling model and along the lines of the C/D model between syllable pulse and intrasyllabic kinematic units. His study includes 11 monosyllabic CVC words consisting of various vowel types (not vowels with the same vowel height as has been previously used when examining the C/D model). The results pinpoint some interesting kinematic timing relationships with the syllable pulse. However, since his study only included monosyllabic isolated words, he could not examine how prominence might affect intrasyllabic kinematic timing. This is a study that needs to be conducted. For such a study, however, it is important to separate vowel quality from prominence or to have a method of normalizing jaw displacement across vowels, as discussed later in this section.
With regard to articulatory boundary strengths, currently used is the combination of (a) the mid-distance between maximum velocity points of the CAs to determine the abstract center of the syllable and (b) the isosceles triangles algorithm. The (abstract) duration between the bases of two consecutive triangles indicates the strength of the syllable boundaries. Would using acceleration peaks or jerks, as proposed by Svensson Lundmark (2023) and Svensson Lundmark and Erickson (2024) lead to a better estimate of articulatory boundary strengths? With regard to the syllable triangle algorithm, currently it is an ad hoc solution that seems to work. Is there an explanation why the algorithm can generate articulatory boundaries perceived by listeners, as reported in Erickson et al. (2015)?
With regard to intonation patterns, as discussed above, the C/D model refers to this aspect of prosody as the “melody” of the utterance which is part of the base function. The C/D model views laryngeal articulation as a complex aspect of the model which encompasses intonation, tonal F0 patterns and various voice quality issues. The laryngeal component of the model, including how to account for intonational patterns, awaits development. The work by Esling and colleagues (Esling et al., 2019) about the larynx as a laryngeal articulator might dovetail nicely with the C/D model.
A final point to be discussed is that the amount of jaw displacement in a syllable is affected by both prominence values and vowel height. As reported by Williams et al. (2013) and Menezes and Erickson (2013), the jaw for a low vowel is 2 mm lower than for a mid vowel, and 4 mm lower than for a high vowel. To date, exploration of the C/D model has focused on examining jaw displacement values in utterances containing vowels with all the same vowel height. These results indicate a relation among (a) magnitude of jaw displacement per syllable, (b) magnitude of syllable prominence within an utterance and (c) magnitude of boundary strengths between syllables. However, in order to pursue application of the C/D model for analyzing spoken utterances, a way to normalize across vowels is needed. An approach to normalizing vowel height was proposed by Williams et al. (2013). Using EMA, they recorded the vertical jaw position of an American English speaker producing six repetitions of three-word sentences where the CVC target mono-syllabic words (shown in Table 1) occurred in initial, middle and final positions. The consonants (Cs) were voiceless /p/, /t/ or /k/, and the vowels (Vs) were /ɪ/, /ɛ/ and /æ/. The sentences were “X type first,” “Type ‘X’ first,” and “First type X,” where “X” was the target CVC. Nuclear stress was placed on the first word of each sentence, i.e., X, Type, and First.
In order to determine the effect that prosodic structure has on jaw displacement independent from vowel height, they proposed a simple equation (see Williams et al., 2013, pp. 3–4).
A two-step vowel normalization algorithm is shown in Figure 8. By factoring out the vowel height (V) effect, as well as other potential effects such as consonants, speech style, the effect of metrical prominence on the amount of jaw displacement could be seen. Figure 9 is a graphic display of vowel neutralization procedure for the sentences, ‘Kip met Pat’ and ‘Pat met Kip’, where nuclear stress is on the final syllable in each of the utterances. Raw jaw displacement measurements are shown in the left-hand panel for these two sentences. Notice that the amount of jaw displacement for “Pat” (which contains the low front vowel /æ/) is larger than that for “Kip” (which contains the high front vowel /ɪ/). In the right-hand panel, we see the neutralized jaw displacement values for these sentences. Notice that after neutralization, the jaw displacement for “Pat” has decreased and that for “Kip” has increased. Here we see that neutralized values are shown in the right-hand panel, labeled “Raw Data”. The bottom panel displays hypothesized metrical grids based on the neutralized values of jaw displacement; regardless of the vowel height, both utterances have the same metrical pattern, i.e., the most jaw displacement is on the final word, the word with nuclear stress.
Finally, to date, most of the experimental explorations of the CD model have focused on the prominence patterns of language. About details of implementing syllable onsets and codas using IRFs, this also has yet to be investigated experimentally; also needed is an examination of vowel articulation.
To conclude this section, as iterated in Section 2.2.1, the objective of this manuscript is to show how the C/D model observes and interprets articulatory events to account for temporal organization of spoken language, not in terms of durational timing between articulation and acoustics, but in terms of rhythmic organization of syllable prominences and subsequent syllable boundaries. This section ends with an albeit partial list of “unfinished business”—things that still need to be examined and fleshed out in the model. The hope is that if indeed the C/D model accounts for prominence and boundaries, then perhaps following through with the yet “unfinished business” of the model might yield important insights into understanding articulation of speech. Along these lines, the next section summarizes some of the applications of the C/D model’s approach to prosody.

4. Applications of the C/D Model’s Approach to Prosody

4.1. Second Language Learning

4.1.1. Language Specific Prominence Patterns

The focus in this section is on patterns of jaw displacement in various languages which reflect phrasal prominence patterns in an utterance. Specifically, this section reviews cross-linguistic differences in prosodic control and transfer of first language (L1) patterns to second language (L2) learning. The studies reported here reinforce the C/D model with regard to the role of the jaw as an articulator and the prosodic organizer of spoken utterances.
Location and strength of boundaries, aspects of the model discussed in Section 2.2.1. which address timing of syllables in an utterance, are not discussed here as no experimental studies have been undertaken with regard to this aspect of the model. This is one of the many areas that need to be explored in order to better understand the model’s approach to phrase-level effects of spoken utterances.
According to work by Erickson and colleagues, syllable prominence patterns are language-specific with language-specific jaw displacement patterns. Articulatory studies of jaw lowering patterns in a number of languages, e.g., English, French, Mandarin, Japanese, and Brazilian Portuguese (e.g., Erickson & Niebuhr, 2023; Menezes, 2003, 2004; Smith et al., 2019; Erickson et al., 2020, 2015, 2012, 2024b; Erickson & Kawahara, 2016) show that indeed jaw patterns are language-specific.
As discussed, English jaw displacement patterns mirror the metrical organization of the utterance, with the most prominence on nuclear stress, then phrasal stress and then foot stress. For languages like French, Mandarin, Japanese, and Brazilian Portuguese (BP), which are said to be variations of edge strengthening languages (e.g., Jun, 2014; also see Kawahara et al., 2015), increased mandible lowering occurs on the phrase’s final stressed/full syllable (Erickson & Niebuhr, 2023; Erickson et al., 2024b), with optional increased mandible lowering on phrase/utterance initial syllable. To exemplify this, Figure 10 shows mandible lowering patterns collected with EMA for French, Japanese and Mandarin, as reported in Erickson and Niebuhr (2023); patterns for BP, collected with the MARRYS helmet, are reported in Erickson et al. (2024b). For these languages, we see the most jaw lowering on the final stressed syllable at the end of each phonological phrase. For French, Japanese and Mandarin, the largest jaw lowering occurs utterance finally, independent of what the pitch accent (Japanese) or tone (Mandarin) is (e.g., Kawahara et al., 2014; Erickson & Niebuhr, 2023; Erickson et al., 2016). The prominence patterns of these languages, where jaw lowering always increases phrase finally, contrasts with that of English. For English, as shown in Figure 2 for the sentence, (I saw) five bright highlights in the sky tonight, increased jaw lowering (broad focus) does not occur always at the end of the phrase/utterance.

4.1.2. Carry-Over of Native Prominence Patterns to Second Language

Speakers of languages like French, Japanese and Mandarin show increased jaw lowering at the ends of phrases/utterances. This is in contrast with English, where the utterance/phrasal prominence does not occur systematically at the utterance/phrasal edges. For example, in the English utterance (Figure 2), (I saw) five bright highlights in the sky tonight, nuclear stress is on sky, not the final word night; phrasal stress is on high, not the final syllable lights, and foot stress is on five, not on the final member of the foot, bright. Moreover, English speakers tend to have choices where to place nuclear stress. As discussed in Erickson and Niebuhr (2023), also Erickson et al. (2012), English speakers of this utterance can also place nuclear stress on high(lights) or on five, resulting thus in a different metrical arrangement of prominence patterns. Recent work by Erickson et al. (2025) suggests that pragmatics plays a role in prominence/jaw lowering patterns in English, e.g., the final word in English topic phrases are not produced with increased jaw lowering unless the final word is emphasized/focused.
These language-specific prominence patterns, also known as jaw movement patterns, are learned at an early age by children, (see, e.g., Svensson Lundmark & Erickson, 2023; Erickson & Niebuhr, 2023). Later, in learning a second language, speakers often carry over these first-learned patterns (see e.g., Erickson & Niebuhr, 2023; Erickson, 2025). An example of this can be seen in Figure 11 of a beginning American English learner of French, producing the utterance, Natacha nattacha pas son chat Pacha qui séchappa, compared with a first language French speaker. The top panel of the figure shows the mandible lowering patterns of a typical French speaker (the same as shown in Figure 10); the bottom panel, those of the American English beginner. The blue arrows in the top figure points to the increased mandible lowering at the ends of the Accentual Phrases (AP) for the French speaker; the blue arrows in the bottom figure point to the increased mandible lowerings of the AE speaker of French. Note that the jaw lowers more on the English lexically stressed syllables of the English cognate words, i.e., NaTAsha, ntTAcha, PAcha, séCHAPpa (capital letters indicate ‘lexically’ stressed syllables). The AE speaker knew some French, as is evidenced by lowered mandible position at the end of AP2, pas and also AP3, chat. Figure 11 suggests that the AE speaker carries over the jaw lowering patterns of English, while at the same time is implementing some of the patterns of the second language.
A similar example of transfer of first language jaw patterns is shown in Figure 12. Here we look at an AE speaker producing the Japanese utterance, Akapajama da (‘They are red pajamas’). The top panel shows the mandible patterns for the AE speaker, who is fairly fluent in Japanese, and the bottom, those for a Japanese speaker (same as shown in Figure 10). For both speakers, increased mandible lowering occurs at the beginning and ending of the utterance; but the AE speaker lowers his jaw more on the lexically stressed middle syllable in the English word, paJAma, indicated by the blue arrow. These illustrations of AE speakers’ productions of a second language which has a prosody different from AE, e.g., French or Japanese, lend support to the hypothesis that second language learners carry over their prominence/jaw lowering patterns of their first language. Moreover, for AE speakers, it is especially difficult to re-program their jaw lowering patterns for loan words or pseudo loan words with English lexical stress.
Speakers of a language with a different prosodic structure than English, e.g., French and Japanese, carry over their jaw displacement patterns when speaking English. This is illustrated in Figure 13 which compares mandible movement patterns for the English utterance, I saw five bright highlights in the sky tonight as produced by an AE speaker (bottom panel) with a French speaker (middle panel), and a Japanese speaker (top panel). For the AE speaker, nuclear stress is on sky, phrasal stress is on highlights, and foot stress is on five, with corresponding increased mandible lowering on these words/syllables. The French and Japanese speakers, however, show more mandible lowering on night, the final word in the utterance. English speakers never put nuclear stress on night, as it violates an English rule that says sentence-final adverbs do not receive nuclear stress, p.e. Haruo Kubozono, see also Erickson and Niebuhr (2023). Notice, also, that compared to the AE speaker, the French speaker lowers the mandible more on bright, presumably because the French speaker feels bright marks the end of an AP. For the AE speaker not only does five have more jaw lowering than bright but the amount of jaw lowering for bright is reduced. The Japanese speaker shows a slight reduction in mandible lowering for bright but not as much as the English speaker shows. Thus, another English rule that these beginners were not aware of is that within a foot unit, one item is stressed while the other is relatively reduced. As for highlights, again the Japanese speaker reduces the amount of mandible lowering on lights, but not as much as the English speaker does. The French speaker seems to produce the two-syllable word highlights with one sustained mandible lowering. These examples illustrate how speakers of a second language carry over the prominence patterns of their first language.

4.1.3. Applications for Second Language Learning

Given that language learners carry over their jaw lowering patterns of the first language, a question arises whether it is possible to re-train the jaw when learning a new language. Preliminary work (Wilson et al., 2019, 2020) indicates that by focusing on jaw lowering patterns (“Jaw dancing training”) Japanese learners of English language produce more L2-type prominence patterns. Not only does prominence increase on the stressed word, as evidenced by F1 raising/F2 lowering, but the following word is reduced, as evidenced by a decrease in F1/increase in F2. See also Erickson (2025).
Two other research studies that support the importance of learning second language prominence patterns are mentioned here. One is an acoustic study showing that language appropriate prominence patterns improve ease of listening (Coulange et al., 2024; Isaacs & Trofimovich, 2012); another is an ongoing study by Takayuki Ito, showing that a robot-tap on the cheek just before producing the stressed word causes the jaw to lower more on the stressed word and less on the following word (pc). These studies about jaw lowering and prominence encourage further exploration of how the jaw is programmed, i.e., for English speakers, is the foot a basic unit of jaw coordination, such that one member is strong, i.e., has more jaw lowering, while the other is weak, has less jaw lowering? Moreover, if more than one syllable has a great amount of jaw lowering, then the jaw is programmed to make a phrase break between the two syllables resulting in a new phrase? For languages like French, Japanese and Mandarin, perhaps the jaw is programmed over an AP unit, not a foot. If there are two large jaw lowerings, then there are two APs. These are things to be explored further.
To summarize, these studies suggest jaw displacement patterns are language specific. Moreover, they suggest that L2 learners can acquire better prominence patterns (better prosody) after a relatively short period of “jaw training”.
A comment about language-specific jaw displacement patterns: Fujimura in his description of the C/D model did not discuss jaw differences in languages per se; however, since jaw displacement patterns are an integral part of the skeleton component of spoken utterances, it follows that in order to calculate syllable and boundary locations and strengths, it is necessary to first assess the jaw displacement patterns in a language. The next step, involving experiments to assess syllable boundary locations and strengths across different languages, has yet to be performed.

4.2. First Language Acquisition

4.2.1. Acquisition of Syllables

As mentioned repeatedly in this article, the underlying premise, inspired by the prosodic/phonologic input to the Fujimura C/D model (Fujimura, 2002), is that syllable prominence patterns are language-specific, based on the underlying metrical structure/organization of syllables of that language. Infants learn these patterns as they watch their caretakers’ mouths/mandibles open and shut at the same time listening to amplitude/duration/formant modulations of a string of syllables. Corroborating work for this hypothesis was reported by Ménard et al. (2009). They reported that French Canadian speakers blind from birth show less F1-raising for emphasized/focused words than do sighted speakers, suggesting that without the visual input, blind speakers do not lower their jaws as much as sighted speakers. Further corroboration about the early acquisition of jaw movement in producing syllables comes from fetal studies showing that mandibular movement occurs even before lip muscles are developed (e.g., Gasser, 1967; Humphrey, 1964). Green et al. (2000, 2002) report that jaw articulation precedes lip articulation. According to MacNeilage’s Frame-Content theory (MacNeilage & Davis, 1990), babies start producing syllable-type jaw movements at the age of about 6 months, about the time they start chewing. The hypothesis suggested here is that language specific jaw movements are learned at an early age, and therefore part of a speaker’s unconscious knowledge of his/her language; thus, they are hard-wired in the speaker’s language system and are not easy to change.

4.2.2. Mandible Patterns and Neural Nesting

Brain wave studies with prenatal and post-natal infants suggest a link between nesting hierarchical neural oscillations and nesting hierarchical metrical structures in spoken languages (e.g., Cabrera & Gervain, 2020; Leong et al., 2014). According to, e.g., Goswami (2019), brain waves can be viewed as nested hierarchies of amplitude modulations with the slower rates governing the faster rates. Perception studies show that syllables entrain with the 5 Hz oscillations. When there is a peak in both the 2 Hz and 5 Hz band, a stressed syllable is heard; when there is a peak in the 2 Hz band but a trough in the 5 Hz band, then an unstressed syllable is heard (Goswami, 2022). A comparison of the nesting of 2 Hz and 5 Hz oscillations involved in speech perception with the prosodic metrical hierarchy is shown schematically in Figure 14 (from Goswami, 2019). The left axis represents the electroencephalogram (EEG) band, the right vertical axis represents the center frequencies of the amplitude modulations as extracted by the S-AMPH modeling (Leong et al., 2014). The figure shows that the neural oscillations and amplitude rates match, thus suggesting a match between nested neural oscillations and linguistic metrical structure.
A hypothesis to be explored is the connection between neural oscillation patterns and jaw excursions. As discussed in Erickson and Niebuhr (2023), perhaps the jaw is a “translator”, which translates metrical hierarchies that are neuronally rooted in movement patterns of speech into levels and sequences of perceptually salient elements. Languages have found different solutions as to how this translation is implemented and functionally exploited within the given neuronal framework. Or, to use a different metaphor, the jaw is the “drummer”, each beat is a syllable pulse arranged temporally to reflect the nesting hierarchical neural oscillations that have been acquired during first language acquisition.

4.3. Clinical Applications

Pilot work with jaw opening related to prominence and boundary patterns suggests that if the jaw opens a lot on one syllable within a single phrase, it opens less on the surrounding syllables (see also Erickson, 1998a). This suggests biomechanical constraints dis-allow two consecutive large jaw openings without a phrase break between. Thus, jaw mechanics have a global effect on the temporal organization of utterances. Fujimura (2002) further substantiates this by showing that when a multisyllabic word, such as America is emphasized, the lexically stressed middle syllable is produced with more jaw lowering, but also we see increased boundary strength between the word initial shwa and the emphasized syllable as well as increased overall articulatory strength (e.g., jaw lowering, tongue movement and duration) of the initial word.
A suggestion here is that biomechanical studies of jaw movement relative to speech prominence patterns can improve our understanding of temporal organization of spoken language. Furthermore, investigating spoken language via the C/D Model’s approach to the jaw as the temporal organizer of speech might lead to ways to address articulatory problems of various clinical populations, such as Parkinson’s disease (PD) or stuttering. A recent articulatory study of Parkinson’s patients by Herbig et al. (2025) reported that Parkinson’s patients produced the second consonant of a CCV syllable slower than the control group. If the PD group were instructed to lower their jaws more, would this ameliorate their productions to be more like the control group?

4.4. Insights into Prosodic Component of Phonology

Linguists traditionally discussed phonology predominantly in terms of linguistically meaningful vowel and consonant segments. Now included in phonology is prosodic “suprasegmental” information, e.g., rhythm, stress, and intonation, and how these contribute to meaning. The C/D model, however, views prosody not as a suprasegmental phenomenon but as the underlying articulatory organizational framework of speech, that is, the phonological prominence patterns dictate the syllable articulation, manifested by the patterns of jaw lowering. This may be too radical of a claim for many phonologists; however, in the English utterance, I saw five bright highlights in the sky tonight, the sentence-final adverb tonight was never produced by first language speakers with a large jaw opening. This suggests that English speakers have a rule against putting nuclear stress on a sentence-final adverb. This finding, together with reports showing a correlation between prominence patterns and jaw articulation, indicates that by examining the jaw, we can better understand prosodic phonology.

5. A New Tool for Studying Jaw Movement

Tools often used for articulatory investigation of tongue, lip and jaw movement include X-Ray Microbeam (XRMB) (Fujimura et al., 1973; Kiritani et al., 1975; Nadler et al., 1987; J. R. Westbury et al., 1994, https://github.com/rsprouse/ xray_microbeam_database (5 July 2025)), Electromagnetic Articulograph (EMA) (both the Carstens EMA (https://www.articulograph.de/) and the NDI Waves (https://www.ndigital.com/)). These methods involve gluing small sensors to the various articulators, e.g., the tongue, lips and lower incisors and recording their movements. (For a review of the various approaches to analyzing articulatory data, see Hardcastle and Hewlett (2006), Stone (1997), Gick et al. (2013) and Erickson and Niebuhr (2023).)
However, in order to pursue the role of the jaw in producing prominence and boundary patterns across a large number of speakers, languages, and vowel environments, an easier, less expensive approach is needed. To this effect, recently, a modified bicycle helmet with bending sensors on the chin strap has been developed. It is called MARRYS (Mandibular-Action Related Rhythm Signals) (Niebuhr & Gutnyk, 2021; Gudmundsson et al., 2024; Svensson Lundmark et al., 2023; Svensson Lundmark & Niebuhr, 2025; Erickson et al., 2024b; Wang et al., 2025). The bending sensors record how much the speaker lowers their jaw during each syllable while speaking. Since it has a bending sensor on both sides of the chin strap, it also measures the degree of jaw symmetry while speaking. The microphone at the top of the helmet records the acoustic data, which together is saved with the articulatory data in an SD card inserted in the helmet. The advantage of the MARRYS, compared to the more expensive and more time-consuming methods, is that it is handy, mobile, as well as affordable, and therefore can be used to collect a large amount of jaw data relatively quickly (e.g., 15 min total set-up and recording time per speaker) from a large number of speakers in a variety of situations, e.g., classrooms, clinical settings, field work.

6. Now What?

The C/D model presents an innovative approach to understanding temporal organization of spoken utterances. The starting point is the phonological/prosodic syllabic input, presumably formulated by neural nesting patterns of syllable amplitudes of first language acquisition. This hypothesis is not explicit in the C/D model but is a logical extension that current technology allows us to explore.
The hypothesis that syllable articulation is the framework of speech organization contrasts with the AP model--which purports that speech is organized by how different types and degrees of vocal tract constrictions are timed with respect to each other. In AP, the jaw affects the various speech articulators, but the jaw per se is not a speech articulator. In the C/D model the jaw is the syllable articulator providing the skeleton/framework of temporal speech organization.
The most innovative aspect of the C/D model is the hypothesis that syllable magnitudes, the output from the phonological/prosodic input, account for speech organization. This contrasts with the usually held linguistic notion that prominence is suprasegmental, i.e., it is “added above” consonant and vowel segments. In the C/D model, the syllable magnitude is articulatorily realized by how much the jaw lowers for each syllable (below the occlusal plane). The amount of jaw lowering is represented in the model by the height of the “syllable pulse”, and the height of the syllable pulse, i.e., magnitude, is transferred to the height/magnitude of the ballistic pulses that specify the onset and coda features of the syllable. The hypothesis that prominence affects onset articulation is indirectly substantiated by research showing less consonant-vowel co-articulation with stressed syllables, e.g., McGuire et al. (2024).
That jaw lowering and syllable prominence are correlated is substantiated by numerous studies about articulation of emphasis and nuclear stress (broad focus), described in the sections above. Generally reported in the above studies is how prominence increases the amount of jaw lowering on a specific syllable.
As concerns the timing of the syllable pulse, Fujimura chose the midpoint between the two “iceberg” points of the onset and coda crucial articulators to represent the location of the syllable pulse. The iceberg points were determined by overlaying the articulatory movement tracings of a single crucial articulator of a number of repetitions of the same utterance spoken by the same speaker. The point where most of the tracings overlapped was considered to represent the iceberg point. The hypothesis was that the iceberg region represented a region of invariance in articulatory movement. The duration between these two points represented the “pure, ideal” syllable, i.e., the syllable preserved from coarticulatory effects of the enveloping consonants (syllable onset and coda (p.c. Caroline Menezes)). This is an interesting hypothesis that has yet to be tested.
The hypothesis also was that each speaker has “unique” icebergs, i.e., the relation between jaw lowering and onset/coda articulation is speaker idiosyncratic (p.c. Caroline Menezes). This hypothesis, also, needs to be tested. A clinical application of this hypothesis is relevant to speech pathologies, such as dysarthria, e.g., people with spastic dysarthria will show “pathological” icebergs in that the space for the tongue mass to move is limited due to the limited jaw movement of their pathology. The assumption, thus, is that syllable magnitude affects crucial articulator velocities. This also is a hypothesis that needs exploration.
Another innovative aspect of the C/D model is that articulatory boundaries between linguistic prosodic units are the result of syllable magnitudes. The prominence patterns, as articulated by how much the jaw lowers/opens for each syllable, account for utterance rhythmic patterns of prominence and phrasing. Thus, to iterate, prosodic prominence patterns are not suprasegmental, but are the foundation of utterance organization.
Fujimura proposed a method for calculating an articulatory syllable duration based on syllable magnitudes; the amount of space (duration) between each articulatory syllable represents the size of the boundary unit, be it is a morpheme boundary, a word boundary, a foot boundary, or a phrase boundary. The approach for calculating articulatory syllable duration (which Fujimura referred to as “abstract syllable duration”) was to construct isosceles syllable triangles around each syllable pulse. The angle of the triangle was determined by locating in the utterance two syllable triangles whose two adjacent vertices touched. The rule was that only one set of syllables could have touching vertices. This resulted in a series of syllable triangles of varying heights according to the prominence of each syllable with spaces between the triangles representing the various durations of boundaries. This is the approach explained in Section 2.2.1.
A question about the validity of the isosceles triangle approach. Fujimura stated that the isosceles syllable triangles were an ad hoc solution (pc: Erickson, Menezes). Pilot experimental results presented in Section 2.2.1 suggest that the solution works to show that syllable magnitude affects the size of boundaries. Yet it is not clear why, or if there might be a better approach to assessing the effect of syllable magnitudes on boundaries. This needs to be investigated further.
Another question along these lines is how to determine the syllable pulse location in a CV or V syllable, i.e., a syllable with no coda or onset. One approach suggested by Fujimura was that all syllables begin and end with some type of constriction; in the case of CV or V syllables, there is glottal constriction. The topic of edge constrictions, treated as IRFs in the C/D model, awaits further implementation. As for geminate consonants, a special syllable concatenator was proposed by Fujimura and Williams (2008).
Experimental explorations of the C/D model are sparse, most of the work by Erickson and colleagues has focused on the relationship between prominence and jaw displacement. However, these works are limited to monosyllabic words all containing the same phonological vowel. A pilot study introduced a method for normalizing vowel quality to illustrate how prominence patterns for the utterances with different vowel heights are produced with similar jaw lowering patterns. As reported in Section 3, high, mid and low vowels spoken with nuclear stress on the final word, e.g., Kip met Pat and Pat met Kip, show that the largest jaw lowering occurs on the final word even though the vowels vary in terms of vowel height. More work with normalizing vowel qualities is necessary in order to assess the merits of the C/D model. Also necessary are experiments with polysyllabic words, not just monosyllabic words. Once a robust vowel normalization procedure is available, it would be possible to then examine jaw patterns of any spoken utterances. A challenge lies with reduced syllables which often show no distinct jaw lowering; a solution might be to measure the jaw value at the midpoint of the vowel. It might be that in rapid spontaneous speech, there is a regrouping of syllable-level jaw lowering, resulting in more of a pattern of foot or phrase-level jaw lowering. These are questions to be pursued experimentally.
An important finding of the research conducted to date within the framework of the C/D model is that each language has its unique pattern of jaw lowerings, that these are transferred to second language patterns of jaw lowering, and that jaw training can change first language jaw patterns to be more “native-like”. More data is needed about jaw lowering patterns across languages, as well as across various clinical populations. With the advent of easier ways to measure jaw lowering, such as the MARRYS helmet, this is an area of research that has great potential.
The goal of this paper was to illustrate the importance of prominence in orchestrating articulation, to open a window for viewing syllable prominence as the impetus for all aspects of speech articulation: syllable edges, nucleus and boundaries. However, the research conducted so far has been primarily on the jaw lowering aspects of the model, and how this relates to prominence and boundary patterns. What is needed now is an exploration of the other parts of the C/D model, i.e., the Base function for describing F0 patterns, voice quality, syllable edge and nucleus articulation. Much still needs to be investigated.
Finally, experiments are needed for experiments assessing the connection between the C/D model and acoustic signal in order to shed light on various applications of the model.
In summary, the C/D model offers a novel way to view the effect of prosody (syllable prominence patterns) on articulation and temporal organization of speech. Fujimura’s Converter/Distributor (C/D) model is unique in using the syllable, rather than phonemes, as the basic concatenative unit and prosodic control initiated by linearly ordered syllable and boundary pulses with specified pulse magnitudes. More experiments are needed to substantiate the details of how the model can be implemented. The hope is that this tutorial of the C/D model and its application to research on prosodic control of speech will benefit both expert and new researchers in the field.

Funding

NSF (SBR-951198) (1995–1998); Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research (C)#19520371 (2007–2010) and (C)#22520412 (2010–2013).

Institutional Review Board Statement

Available from Japan Advanced Institute of Science and Technology Life Science Committee Approval #22-006 (2010–1013), # 24-103 (2010–2013), # 25-011 (2010–2016).

Informed Consent Statement

Available from Japan Advanced Institute of Science and Technology Life Science Committee Approval #22-006 (2010–1013), # 24-103 (2010–2013), # 25-011(2010–2016).

Data Availability Statement

The data presented in this study are available on request from the authors pending institutional review approval due to the use of HIPAA confidential datasets.

Acknowledgments

This manuscript comes from knowing and working with Osamu Fujimura while at The Ohio State University. I am indebted to J. C. Williams for her input to the model, specifically, the prosodic component of the model. For the final touches to the manuscript, I thank Reiner Wilhelms-Tricarico and Caroline Menezes for their insights into the C/D model and comments based on her interactions with Osamu Fujimura. I also thank Malin Svensson Lundmark for lively interactions about syllable pulse and syllable onset magnitudes. I thank the various research laboratories where the data was collected and for the technical help I received, e.g., Waisman XRMB center, Jianwu Dang’s research laboratory at Japan Advanced Institute of Science and Technology, the GIPSA lab, Haskins Laboratories, Atsuo Suemitsu, Christopher Savariaux, Mark Tiede, Ian Wilson, Shigeto Kawahara, Jeff Moore and many other colleagues who have worked with me collecting and analyzing articulatory data.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Beckman, M. E., & Edwards, J. (1994). Articulatory evidence for differentiating stress categories. In P. Keating (Ed.), Papers in laboratory phonology, vol. III (pp. 7–33). Cambridge University Press. [Google Scholar]
  2. Bonaventura, P. (2003). Invariant patterns in articulatory movements [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. Available online: http://rave.ohiolink.edu/etdc/view?acc_num=osu1070119339 (accessed on 10 October 2025).
  3. Bonaventura, P., & Fujimura, O. (2007). Articulatory movements and phrase boundaries. In P. Beddor, J. J. Ohala, & J. M. Solé (Eds.), Experimental approaches to phonology, in honor of John Ohala (pp. 209–227). Oxford University Press. [Google Scholar]
  4. Browman, C. P., & Goldstein, L. (1992). Articulatory phonology: An overview. Phonetica, 49, 155–180. [Google Scholar] [CrossRef] [PubMed]
  5. Byrd, D., & Krivokapi’c, J. (2021). Cracking prosody in articulatory phonology. Annual Review of Linguistics, 7(1), 31–53. [Google Scholar] [CrossRef]
  6. Cabrera, L., & Gervain, J. (2020). Speech perception at birth: The brain encodes fast and slow temporal information. Science Advances, 6(30), eaba7830. [Google Scholar] [CrossRef]
  7. Coulange, S., Kato, T., Rossato, S., & Masperi, M. (2024, September 1–5). Exploring impact of pausing and lexical stress patterns on L2 English comprehensibility in real time. Interspeech 2024, Kos, Greece. [Google Scholar]
  8. de Jong, K. (1995). The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. Journal of the Acoustical Society of America, 97, 491–504. [Google Scholar] [CrossRef]
  9. Erickson, D. (1998a). Effects of contrastive emphasis on jaw opening. Phonetica, 55, 147–169. [Google Scholar] [CrossRef]
  10. Erickson, D. (1998b, July). Jaw movement and rhythm in English dialogues (pp. 49–56). Technical Report. Insitute of Electronics Information and Communication Engineers. [Google Scholar]
  11. Erickson, D. (2002). Articulation of extreme formant patterns for emphasized vowels. Phonetica, 59, 134–149. [Google Scholar] [CrossRef]
  12. Erickson, D. (2003, September 17–19). The jaw as a prominence articulator in American English. Acoustical Society of Japan, Fall Meeting (pp. 311–312), Tokyo, Japan. [Google Scholar]
  13. Erickson, D. (2004, June 11–13). On phrasal organization and jaw opening. CDRom publication. From Sound to Sense (p. 24), Cambridge, MA, USA. [Google Scholar]
  14. Erickson, D. (2024a). Articulatory models: The Fujimura C/D model. In Speech sciences entries. Speech Prosody Studies Group. Available online: https://gepf.falar.org/entries/61 (accessed on 10 October 2025).
  15. Erickson, D. (2024b). Articulatory prosody. In Speech sciences entries. Speech Prosody Studies Group. Available online: https://gepf.falar.org/entries/60 (accessed on 10 October 2025).
  16. Erickson, D. (2025). Mandible dancing, a way to teach language rhythm. Japanese Speech Communication, 13(3), 1–19. [Google Scholar]
  17. Erickson, D., Barbosa, P., & Silveira, G. (2024a, May 13–17). The interplay between acoustics and syllable articulation organized by mandible movement. International Seminar on Speech Production, Autrans, France. [Google Scholar]
  18. Erickson, D., Hashi, M., & Maekawa, K. (2000). Articulatory and acoustic correlates of prosodic contrasts: A comparative study of vowels in Japanese and English. Journal of the Acoustical Society of Japan, 56, 265–266. [Google Scholar]
  19. Erickson, D., Huang, T., & Menezes, C. (2020, May 25–28). Temporal organization of spoken utterances from an articulatory point of view. 10th International Conference of Speech Prosody (pp. 1–5), Tokyo, Japan. [Google Scholar]
  20. Erickson, D., & Imaizumi, S. (2015). Feature articles: Adventures in speech science: Focus on the C/D Model and its impact on phonetics and phonology. Special Issue on the C/D Model, Journal of the Phonetic Society of Japan, 19(2), 1–124. [Google Scholar]
  21. Erickson, D., Iwata, R., & Suemitsu, A. (2016, May 24–27). Jaw displacement and phrasal stress in Mandarin Chinese. TAL 2016 (pp. 1–5), Buffalo, NY, USA. [Google Scholar]
  22. Erickson, D., & Kawahara, S. (2015). A practical guide to calculating syllable prominence, timing and boundaries in the C/D model. Special Issue on the C/D Model. Journal of Phonetic Society of Japan, 19(2), 16–21. [Google Scholar]
  23. Erickson, D., & Kawahara, S. (2016). Articulatory correlates of metrical structure: Studying jaw displacement patterns. Linguistic Vanguard, 2, 102–110. [Google Scholar] [CrossRef]
  24. Erickson, D., Kawahara, S., Wilson, I., Menezes, C., Suemitsu, A., Shibuya, Y., & Moore, J. (2014, September 18–21). Jaw displacement patterns as articulatory correlates of metrical structure. Phonetic Building Blocks of Speech, in Honor of John Esling, Victoria, BC, Canada. [Google Scholar]
  25. Erickson, D., Kim, J., Kawahara, S., Wilson, I., Menezes, C., Suemitsu, A., & Moore, J. (2015, August 10–14). Bridging articulation and perception: The C/D model and contrastive emphasis. International Congress of Phonetic Sciences 2015, Glasgow, UK. [Google Scholar]
  26. Erickson, D., & Niebuhr, O. (2023). Articulation of prosody and rhythm: Some possible applications to language teaching. In Proceedings of the 13th International Conference of Nordic Prosody (pp. 1–45). Language Science Press. [Google Scholar] [CrossRef]
  27. Erickson, D., Raso, T., Svensson Lundmark, M., Frid, J., & Coulange, S. (2025). The many colors of prominence: A pilot study of topic prosodic units. Journal of Speech Sciences, 14, e025008. [Google Scholar] [CrossRef]
  28. Erickson, D., Rilliard, A., Svensson Lundmark, M., Rebollo Couto, L., Silva, A., de Moraes, J., & Niebuhr, O. (2024b, September 1–5). Collecting mandible movement in Brazilian Portuguese. Interspeech 2024, Kos, Greece. [Google Scholar]
  29. Erickson, D., Suemitsu, A., Shibuya, Y., & Tiede, M. (2012). Metrical structure and production of English rhythm. Phonetica, 69, 180–190. [Google Scholar] [CrossRef]
  30. Erickson, D., Svensson Lundmark, M., & Huang, T. (in press). Jaw opening patterns and their correspondence with syllable stress patterns. In L. Meyer, & A. Strauss (Eds.), Rhythms of speech and language (Chapter 2.3). Cambridge University Press.
  31. Esling, J. H., Moisik, S. R., Benner, A., & Crevier-Buchman, L. (2019). Voice quality: The laryngeal articulator model. Cambridge University Press. [Google Scholar] [CrossRef]
  32. Fujimura, O. (1986). Relative invariance of articulatory movements. In J. S. Perkell, & D. H. Klatt (Eds.), Invariance and variability in speech processes. Lawrence Erlbaum. [Google Scholar]
  33. Fujimura, O. (1994). C/D model: A computational model of phonetic implementation. In E. Ristad (Ed.), Language computations (Vol. 17, pp. 1–20). DIMACS Series in Discrete Mathematics and Theoretical Computer Science. American Mathematical Society. [Google Scholar]
  34. Fujimura, O. (2000). The C/D model and prosodic control of articulatory behavior. Phonetica, 57(2–4), 128–138. [Google Scholar] [CrossRef]
  35. Fujimura, O. (2002). Temporal organization of speech utterance: A C/D model perspective. Cadernos de Estudos Linguísticos, Campinas, 43, 9–36. [Google Scholar] [CrossRef]
  36. Fujimura, O. (2008, May 6–9). Pitch accent in Japanese: Implementation by the C/D model. SP2008 (pp. 313–316), Campinas, Brazil. [Google Scholar]
  37. Fujimura, O., Erickson, D., & Wilhelms, R. (1991, August 19–24). Prosodic effects on articulatory gestures—A model of temporal organization. XIIth International Congress on Phonetic Sciences (Vol. 2, pp. 26–29), Aix-en-Provence, France. [Google Scholar]
  38. Fujimura, O., Ishida, H., & Kiritani, S. (1973). Computer controlled radiography for observation of movements of articulatory and other human organs. Computers in Biology and Medicine, 3, 371–384. [Google Scholar] [CrossRef]
  39. Fujimura, O., & Williams, C. J. (2008). Prosody and syllables. Phonological Studies, 11, 65–74. [Google Scholar]
  40. Fujimura, O., & Williams, J. C. (2015). Remarks on the C/D model. Journal of the Phonetic Society of Japan, 19(2), 2–8. [Google Scholar]
  41. Gasser, R. F. (1967). The development of the facial muscles in man. American Journal of Anatomy, 120, 357–376. [Google Scholar] [CrossRef]
  42. Gick, B., Wilson, I., & Derrick, D. (2013). Articulatory phonetics. Wiley-Blackwell. [Google Scholar]
  43. Goswami, U. (2019). Speech rhythm and language acquisition: An amplitude modulation phase hierarchy perspective. New York Academy of Sciences, 1453, 67–78. [Google Scholar] [CrossRef]
  44. Goswami, U. (2022, May 23–26). Acoustic structure in the amplitude envelope and speech prosody: A psycholinguistic and developmental perspective. 1th International Conference of Speech Prosody (pp. 1–5), Lisbon, Portugal. [Google Scholar]
  45. Green, J. R., Moore, C. A., Higashikawa, M., & Steeve, R. W. (2000). The physiologic development of speech motor control: Lip and jaw coordination. Journal of Speech, Language, and Hearing Research, 43, 239–255. [Google Scholar] [CrossRef]
  46. Green, J. R., Moore, C. A., & Reilly, K. J. (2002). The sequential development of jaw and lip control for speech. Journal of Speech, Language, and Hearing Research, 45(1), 66–79. [Google Scholar] [CrossRef] [PubMed]
  47. Gudmundsson, V. F., Gönczi, K. M., Svensson Lundmark, M., Erickson, D., & Niebuhr, O. (2024, September 1–5). The MARRYS helmet: A new device for researching and training “jaw dancing”. 25th Interspeech Conference (pp. 1–5), Kos, Greece. [Google Scholar]
  48. Guenther, F. H. (1994). A neural network model of speech acquisition and motor equivalent speech production. Biological Cybernetics, 72(1), 43–53. [Google Scholar] [CrossRef]
  49. Hardcastle, W. J., & Hewlett, N. (2006). Coarticulation: Theory, data and techniques. Cambridge University Press. [Google Scholar]
  50. Harrington, J., Fletcher, J., & Beckman, M. E. (2000). Manner and place conflicts in the articulation of Australian English. In J. Broe, & J. B. Pierrehumbert (Eds.), Papers in laboratory phonology, vol. 5 (pp. 40–51). Cambridge University Press. [Google Scholar]
  51. Hayes, B. (1995). Metrical stress theory: Principles and case studies. University of Chicago Press. [Google Scholar]
  52. Herbig, E., Mücke, D., Michael, T., Barbe, M. T., & Thies, T. (2025). Executive dysfunctions impair and levodopa improves articulatory timing in Parkinson‘s disease. Frontiers in Human Neuroscience, 19, 1580376. [Google Scholar] [CrossRef]
  53. Humphrey, T. (1964). Some correlations between the appearance of human fetal reflexes and the development of the nervous system. Progress in Brain Research, 4, 93–135. [Google Scholar]
  54. Isaacs, T., & Trofimovich, P. (2012). Deconstructing comprehensibility. Identifying the linguistic influences on listeners’ L2 comprehensibility ratings. Cambridge University Press. [Google Scholar]
  55. Jun, S.-A. (Ed.). (2014). Prosodic typology: By prominence type, word prosody, and macrorhythm. In Prosodictypology II. The phonology of intonation and phrasing (pp. 520–539). Oxford University Press. [Google Scholar]
  56. Kawahara, S., Erickson, D., Moore, J., Suemitsu, A., & Shibuya, Y. (2014). Jaw displacement and metrical structure in Japanese: The effect of pitch accent, foot structure, and phrasal stress. Journal of Phonetic Society of Japan, 18(2), 77–87. [Google Scholar]
  57. Kawahara, S., Erickson, D., & Suemitsu, A. (2015). Edge prominence and declination in Japanese jaw displacement patterns: A view from the C/D model. Special Issue on the C/D Model, Journal of the Phonetic Society of Japan, 19(2), 33–43. [Google Scholar]
  58. Kent, R. D., & Netsell, R. (1971). Effects of stress contrasts on certain articulatory parameters. Phonetica, 24, 23–44. [Google Scholar] [CrossRef]
  59. Kim, J., Erickson, D., & Lee, S. (2015). More about contrastive emphasis and the C/D model. Special Issue on the C/D Model. Journal of Phonetic Society of Japan, 19(2), 44–54. [Google Scholar]
  60. Kiritani, S., Itoh, K., & Fujimura, O. (1975). Tongue-pellet tracking by a computer-controlled X-ray microbeam system. Journal of the Acoustical Society of America, 57, 1516–1520. [Google Scholar] [CrossRef]
  61. Lehiste, I. (1970). Suprasegmentals. MIT Press. [Google Scholar]
  62. Leong, V., Stone, M. A., Turner, R. E., & Goswami, U. (2014). A role for amplitude modulation phase relationships in speech rhythm perception. Journal of the Acoustical Society of America, 136(1), 366–381. [Google Scholar] [CrossRef]
  63. Liberman, M., & Prince, A. (1977). On stress & linguistic rhythm. Linguistic Inquiry, 8, 249–336. [Google Scholar]
  64. Loevenbruck, H. (1999, August 1–7). An investigation of articulatory correlated of the accentual phrase in French. 14th International Congress of Phonetic Sciences (pp. 667–670), San Francisco, CA, USA. [Google Scholar]
  65. Loevenbruck, H. (2000, June 19–23). Effets articulatoires de l’emphase contrastive sur la Phrase Accentuelle en français. 23ème Journées d’Etude sur la Parole (pp. 165–169), Aussois, France. [Google Scholar]
  66. Macchi, M. (1995). Segmental and suprasegmental features and lip and jaw articulations [Ph.D. thesis, New York University]. [Google Scholar]
  67. Macchi, M. (1998). Labial articulation patterns associated with segmental features and syllable structure in English. Phonetica, 45, 109–121. [Google Scholar] [CrossRef]
  68. MacNeilage, P. F., & Davis, B. L. (1990). Acquisition of speech production: The achievement of segmental independence. In W. J. Hardcastle, & A. Marchal (Eds.), Speech production and modeling (pp. 55–68). Kluwer Academic. [Google Scholar]
  69. Matsui, F. M. (2017). On the input information of the C/D model for vowel devoicing in Japanese. Journal of the Phonetic Society of Japan, 21(1), 127–140. [Google Scholar]
  70. McGuire, P., Hsieh, F.-F., & Chan, Y.-C. (2024, May 13–17). Articulatory dynamics of lexical stress in L2 English: A case study of Taiwanese Mandarin speakers. ISSP 2024—13th International Seminar on Speech Production 2024, Autrans, France. [Google Scholar]
  71. Menezes, C. (2003). Rhythmic pattern of American English: An articulatory and acoustic study [Ph.D. dissertation, Ohio State University]. [Google Scholar]
  72. Menezes, C. (2004). Changes in phrasing in semispontaneous emotional speech: Articulatory evidences. Journal of the Phonetic Society of Japan, 8, 45–59. [Google Scholar]
  73. Menezes, C., & Erickson, D. (2013). Intrinsic variations in jaw deviations in English vowels. Proceedings of Meetings on Acoustics, 19, 060253. [Google Scholar] [CrossRef]
  74. Ménard, L., Dupont, S., Baum, S. R., & Aubin, J. (2009). Production and perception of French vowels by congenitally blind adults and sighted adults. Journal of the Acoustical Society of America, 126(3), 1406–1414. [Google Scholar] [CrossRef]
  75. Nadler, R., Abbs, J. H., & Fujimura, O. (1987, August 1–7). Speech movement research using the new X-ray microbeam system. 11th International Congress of Phonetic Sciences (pp. 221–224), Tallinn, Estonia. [Google Scholar]
  76. Niebuhr, O., & Gutnyk, A. (2021, June 12–13). Pronunciation engineering: Investigating the link between jaw-movement patterns and perceived speaker charisma using the MARRYS cap. 3rd IEEE International Conference on Electrical, Communication and Computer Engineering (ICECCE) (pp. 1–6), Kuala Lumpur, Malaysia. [Google Scholar]
  77. Obert, K., Yun, J., Erickson, D., Reeve, M., Rowson, H., & Møller, K. (2023). Voice quality: Interactions among F0, vowel quality, phonation mode and pharyngeal narrowing. In Proceedings of the 13th International Conference of Nordic Prosody (pp. 190–199). Language Science Press. [Google Scholar] [CrossRef]
  78. Perrier, P. (2014, July 22–23). “GEPPETO”: A target-based model of speech production including optimal planning and physical modeling. Adventures in Speech Science, Tokyo, Japan. hal-01057251. Available online: https://hal.science/hal-01057251v1 (accessed on 5 May 2025).
  79. Saltzman, E., & Kelso, J. A. (1987). Skilled actions: A task-dynamic approach. Psychological Review, 94(1), 84–106. [Google Scholar] [CrossRef]
  80. Saltzman, E., & Munhall, K. (1989). A dynamical approach to gestural patterning in speech production. Ecological Psychology, 1(4), 333–382. [Google Scholar] [CrossRef]
  81. Saltzman, E., Nam, H., Krivokapic, J., & Goldstein, L. (2008, May 6–9). A task-dynamic toolkit for modeling the effects of prosodic structure on articulation. 4th International Conference on Speech Prosody (pp. 175–184), Campinas, Brazil. Available online: https://www.isca-archive.org/speechprosody_2008/saltzman08_speechprosody.pdf (accessed on 17 August 2025).
  82. Selkirk, E. O. (1982). The syllable. Foris. [Google Scholar]
  83. Smith, C., Erickson, D., & Savariaux, C. (2019). Articulatory and acoustic correlates of prominence in French: Comparing L1 and L2 speakers. Journal of Phonetics, 77, 100938. [Google Scholar] [CrossRef]
  84. Stone, M. (1981). Evidence for a rhythm pattern in speech production: Observations of jaw movement. Journal of Phonetics, 9, 109–120. [Google Scholar] [CrossRef]
  85. Stone, M. (1997). Laboratory techniques for investigating speech articulation. In J. Hardcastle, & J. Laver (Eds.), The handbook of phonetic sciences (pp. 1–32). Blackwell. [Google Scholar]
  86. Summers, W. V. (1987). Effects of stress and final consonant voicing on vowel production: Articulatory and acoustic analyses. Journal of the Acoustical Society of America, 82, 847–863. [Google Scholar] [CrossRef]
  87. Svensson Lundmark, M. (2023). Rapid movements at segment boundaries. Journal of the Acoustical Society of America, 153(3), 1452–1467. [Google Scholar] [CrossRef]
  88. Svensson Lundmark, M. (2024, September 1–5). Magnitude and timing of acceleration peaks in stressed and unstressed syllables. Interspeech 2024 (pp. 2630–2634), Kos, Greece. [Google Scholar]
  89. Svensson Lundmark, M. (2025). Segmental articulatory phonetics. Speech Prosody Studies Group. Available online: https://gepf.falar.org/entries/62 (accessed on 2 August 2025).
  90. Svensson Lundmark, M., & Erickson, D. (2023, August 7–11). Comparing apples to oranges—Asynchrony in jaw & lip articulation of syllables. ICPHS 2023 (pp. 1097–1101), Prague, Czech Republic. [Google Scholar]
  91. Svensson Lundmark, M., & Erickson, D. (2024). Segmental and syllabic articulations: A descriptive approach. Journal of Speech, Language, and Hearing Research, 67, 3974–4001. [Google Scholar] [CrossRef] [PubMed]
  92. Svensson Lundmark, M., Erickson, D., Niebuhr, O., Tiede, M., & Chen, W.-R. (2023, June 1–4). A new articulatory tool: Comparison of EMA and MARRYS. PaPE 2023 (pp. 33–34), Nijmegen, The Netherlands. [Google Scholar]
  93. Svensson Lundmark, M., & Niebuhr, O. (2025). A practical guide to recording and analyzing jaw movements in speech research using MARRYS. In Proceedings from FONETIK 2025 (pp. 51–56). Linneaus University Press. Available online: https://portal.research.lu.se/en/publications/a-practical-guide-to-recording-and-analyzing-jaw-movements-in-spe (accessed on 10 July 2025).
  94. Tabain, M. (2003). Effects of prosodic boundary on/aC/sequences: Articulatory results. The Journal of the Acoustical Society of America, 113, 2834–2849. [Google Scholar] [CrossRef]
  95. Turk, A., & Shattuck-Hufnagel, S. (2020). Speech timing: Implications for theories of phonology, phonetics, and speech motor control. Oxford University Press. [Google Scholar]
  96. Turk, A., Shattuck-Hufnagel, S., Elie, B., & Šimko, J. (2025). From phonological symbols to articulation: A new 3-component model of speech production. In Speech sciences entries. Speech Prosody Studies Group. [Google Scholar]
  97. Wang, K., Chen, K., Sun, J., Frid, J., Hayashi, R., Erickson, D., & Niebuhr, O. (2025). Mandible movements in Japanese: A comparative study with Mandarin speakers. Journal of the Acoustical Society of America, 157, A75. [Google Scholar] [CrossRef]
  98. Westbury, J., & Fujimura, O. (1989). An articulatory characterization of contrastive emphasis. Journal of the Acoustical Society of America, 85, S98. [Google Scholar] [CrossRef]
  99. Westbury, J. R., Turner, G., & Dembowski, J. (1994). Xray microbeam speech production database user’s handbook. Manuscript. University of Wisconsin. Available online: https://ubeam.engr.wisc.edu/pdf/ubdbman.pdf (accessed on 5 October 2023).
  100. Williams, J. C., Erickson, D., Ozaki, Y., Suemitsu, A., Minematsu, N., & Fujimura, O. (2013). Neutralizing differences in jaw displacement for English vowels. Journal of the Acoustical Society of America, 133, 3607. [Google Scholar] [CrossRef]
  101. Wilson, I., Erickson, D., Kawahara, S., & Monou, T. (2019, September 28–29). Acquiring jaw movement patterns in a second language: Some lexical factors. Phonetic Society of Japan, Fall Meeting, Tokyo, Japan. [Google Scholar]
  102. Wilson, I., Erickson, D., Vance, T., & Moore, J. (2020, May 25–28). Jaw dancing American style: A way to teach English rhythm. Speech Prosody, 2020 (pp. 1406–1414), Tokyo, Japan. [Google Scholar]
Figure 1. Prosodic phonological input to the C/D Model in terms of metrical tree plus utterance parameters with numeric controls, e.g., speed, formality, excitement, dialect, speaker age. (adapted from Fujimura 1994).
Figure 1. Prosodic phonological input to the C/D Model in terms of metrical tree plus utterance parameters with numeric controls, e.g., speed, formality, excitement, dialect, speaker age. (adapted from Fujimura 1994).
Languages 10 00298 g001
Figure 2. Top panel shows a metrical grid for the utterance, “(I saw) five bright highlights in the sky tonight”; bottom panel shows mandible tracings (y-axis indicates amount of mandible lowering in mm). The arrows in the jaw tracings indicate foot, phrase and utterance (nuclear) stress, respectively. The filled boxes with x’s indicate the stress level of each of the words in the utterance. Adapted from Erickson and Niebuhr (2023).
Figure 2. Top panel shows a metrical grid for the utterance, “(I saw) five bright highlights in the sky tonight”; bottom panel shows mandible tracings (y-axis indicates amount of mandible lowering in mm). The arrows in the jaw tracings indicate foot, phrase and utterance (nuclear) stress, respectively. The filled boxes with x’s indicate the stress level of each of the words in the utterance. Adapted from Erickson and Niebuhr (2023).
Languages 10 00298 g002
Figure 3. Acoustic signal and spectrogram for the utterance Pam said bat that fat cat at the mat are shown in the top two panels. The next four panels show, respectively, the vertical position tracings of crucial articulators (TD, TT, LL) and mandible (jaw). The y-axis values (mm, not explicitly labeled) show the general pattern of tongue, lip and jaw positions for this utterance. The x-axis shows time in ms. BAT and then fat have the greatest amount of jaw lowering for this utterance. (Speaker A00, ut.11). (Adapted from Erickson, 2024a).
Figure 3. Acoustic signal and spectrogram for the utterance Pam said bat that fat cat at the mat are shown in the top two panels. The next four panels show, respectively, the vertical position tracings of crucial articulators (TD, TT, LL) and mandible (jaw). The y-axis values (mm, not explicitly labeled) show the general pattern of tongue, lip and jaw positions for this utterance. The x-axis shows time in ms. BAT and then fat have the greatest amount of jaw lowering for this utterance. (Speaker A00, ut.11). (Adapted from Erickson, 2024a).
Languages 10 00298 g003
Figure 4. Same as Figure 3, but with velocity tracings (v) as well as vertical position tracings (z) of Crucial Articulators (TD, TT, LL) and mandible (jaw) for the utterance Pam said bat that fat cat at the mat. (from Erickson, 2024a). The explanation of the red and blue arrows is in the text.
Figure 4. Same as Figure 3, but with velocity tracings (v) as well as vertical position tracings (z) of Crucial Articulators (TD, TT, LL) and mandible (jaw) for the utterance Pam said bat that fat cat at the mat. (from Erickson, 2024a). The explanation of the red and blue arrows is in the text.
Languages 10 00298 g004
Figure 5. String of syllable pulses and syllable boundaries for Pam said bat that fat cat at the mat. The height of the pulses indicates syllable magnitude (prominence) and downward pointing arrows indicate morphological and syntactic breaks. The large arrow indicates the largest phrase break (from Erickson, 2024a).
Figure 5. String of syllable pulses and syllable boundaries for Pam said bat that fat cat at the mat. The height of the pulses indicates syllable magnitude (prominence) and downward pointing arrows indicate morphological and syntactic breaks. The large arrow indicates the largest phrase break (from Erickson, 2024a).
Languages 10 00298 g005
Figure 6. Possible input metrical organization for Pam said bat that fat cat at that mat, derived from observed articulatory kinematics, and converted to syllable pulse triangles (top panel). (from Erickson, 2024a).
Figure 6. Possible input metrical organization for Pam said bat that fat cat at that mat, derived from observed articulatory kinematics, and converted to syllable pulse triangles (top panel). (from Erickson, 2024a).
Languages 10 00298 g006
Figure 7. C/D diagram of ‘kit’ (Fujimura, 2002). See text for explanation of figure.
Figure 7. C/D diagram of ‘kit’ (Fujimura, 2002). See text for explanation of figure.
Languages 10 00298 g007
Figure 8. Two-step vowel normalization algorithm (from Williams et al., 2013).
Figure 8. Two-step vowel normalization algorithm (from Williams et al., 2013).
Languages 10 00298 g008
Figure 9. Graphic display of vowel neutralization procedure for the sentences, ‘Kip met Pat’ and ‘Pat met Kip’, where nuclear stress is on the final syllable in each of the utterances. Raw jaw displacement measurements are shown in the left panel; neutralized values are shown in the left panel. Notice that the final syllables (marked with arrows) in the right panel have greater jaw lowering (marked with arrows) than those in the left panel, thus indicating that once vowel height is neutralized, the jaw lowers more for nuclear stress regardless of the vowel height. The bottom panel displays the hypothesized metrical grids for each of these utterances (adapted from Erickson et al., 2014; Erickson & Kawahara, 2016, Figure 2.31; Erickson & Niebuhr, 2023, Figure 2.31).
Figure 9. Graphic display of vowel neutralization procedure for the sentences, ‘Kip met Pat’ and ‘Pat met Kip’, where nuclear stress is on the final syllable in each of the utterances. Raw jaw displacement measurements are shown in the left panel; neutralized values are shown in the left panel. Notice that the final syllables (marked with arrows) in the right panel have greater jaw lowering (marked with arrows) than those in the left panel, thus indicating that once vowel height is neutralized, the jaw lowers more for nuclear stress regardless of the vowel height. The bottom panel displays the hypothesized metrical grids for each of these utterances (adapted from Erickson et al., 2014; Erickson & Kawahara, 2016, Figure 2.31; Erickson & Niebuhr, 2023, Figure 2.31).
Languages 10 00298 g009
Figure 10. Acoustic spectrograms and mandible tracings (y-axis amount of mandible lowering in mm) for French (top), Japanese (middle) and Mandarin (bottom). English gloss from top to bottom is ‘Natasha didn’t tie her cat, Pasha, who escaped from her’.That’s why Mana’s hair is silky smooth’, and ‘Mother curses the horse’. Adapted from Erickson and Niebuhr (2023).
Figure 10. Acoustic spectrograms and mandible tracings (y-axis amount of mandible lowering in mm) for French (top), Japanese (middle) and Mandarin (bottom). English gloss from top to bottom is ‘Natasha didn’t tie her cat, Pasha, who escaped from her’.That’s why Mana’s hair is silky smooth’, and ‘Mother curses the horse’. Adapted from Erickson and Niebuhr (2023).
Languages 10 00298 g010
Figure 11. Mandible tracings (y-axis shows jaw position in mm) for a French speaker (top) and an English speaker (bottom) of the French utterance Natacha nattacha pas son chat pacha qui séchappa (‘Natasha didn’t tie her cat, Pasha, who escaped from her’). (Adapted from Erickson & Niebuhr, 2023).
Figure 11. Mandible tracings (y-axis shows jaw position in mm) for a French speaker (top) and an English speaker (bottom) of the French utterance Natacha nattacha pas son chat pacha qui séchappa (‘Natasha didn’t tie her cat, Pasha, who escaped from her’). (Adapted from Erickson & Niebuhr, 2023).
Languages 10 00298 g011
Figure 12. Mandible tracings for the Japanese utterance, Aka pajama da (‘They are red pajamas’) uttered by a Japanese speaker (bottom panel) and American English speaker (top panel). The white arrows indicate increased jaw displacement at the beginning and end of the utterance, typical for an edge strengthening language like Japanese. The blue arrow, pointing to increased jaw displacement on the middle syllable of pajama indicates the American speaker transferred the lexical stress of the English word pajama to that of the Japanese cognate word (from Erickson & Niebuhr, 2023).
Figure 12. Mandible tracings for the Japanese utterance, Aka pajama da (‘They are red pajamas’) uttered by a Japanese speaker (bottom panel) and American English speaker (top panel). The white arrows indicate increased jaw displacement at the beginning and end of the utterance, typical for an edge strengthening language like Japanese. The blue arrow, pointing to increased jaw displacement on the middle syllable of pajama indicates the American speaker transferred the lexical stress of the English word pajama to that of the Japanese cognate word (from Erickson & Niebuhr, 2023).
Languages 10 00298 g012
Figure 13. Mandible tracings of the English utterance I saw five bright highlights in.the sky tonight. English jaw tracings are shown for English speakers in the bottom panel, for French speakers of English in the middle panel, and for Japanese speakers of English in the top panel. The blue arrows in the bottom panel showing the English speaker point to increased jaw displacements for the first member of each of the foot, five bright, highlights, and sky tonight, with phrasal stress on high and nuclear stress on sky. The blue arrows in the Japanese and French speakers point to increased jaw displacements in the ‘wrong’ places (from Erickson & Niebuhr, 2023).
Figure 13. Mandible tracings of the English utterance I saw five bright highlights in.the sky tonight. English jaw tracings are shown for English speakers in the bottom panel, for French speakers of English in the middle panel, and for Japanese speakers of English in the top panel. The blue arrows in the bottom panel showing the English speaker point to increased jaw displacements for the first member of each of the foot, five bright, highlights, and sky tonight, with phrasal stress on high and nuclear stress on sky. The blue arrows in the Japanese and French speakers point to increased jaw displacements in the ‘wrong’ places (from Erickson & Niebuhr, 2023).
Languages 10 00298 g013
Figure 14. Schematic depiction of linguistic hierarchy (based on Figure 3 in Goswami, 2019).
Figure 14. Schematic depiction of linguistic hierarchy (based on Figure 3 in Goswami, 2019).
Languages 10 00298 g014
Table 1. Target vowels occurring in CVC words with systematically varied onset and coda voiceless stops. (Adapted from Williams et al., 2013). The * indicates non-words.
Table 1. Target vowels occurring in CVC words with systematically varied onset and coda voiceless stops. (Adapted from Williams et al., 2013). The * indicates non-words.
VowelCoda kCoda pCoda t
iKeek, peak, teakkeep, peep, *teapkeet, peat, teat
ɪkick, pick, tickkip, pip, tipkit, pit, tit
ukook, Pook, Tukecoop, poop, *toopcoot, poot, toot
ʊcook, ____, took___________________, put, _____
ecake, Pake, takecape, pape, tapeKate, pate, Tate
ɛkeck, peck, techkep, pep, *tep*ket, pet, Tet
ocoke, poke, tokecope, pope, taupecoat, pote, tote
ɔcaulk, ____, talk______________caught, ____, taught
ʌ*kuck, puck, tuckcup, pup, tupcut, putt, tut
æcack, pack, tackcap, pap, tapcat, pat, tat
ɑcock, pock, tockcop, pop, topcot, pot, tot
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Erickson, D.M. The C/D Model and the Effect of Prosodic Structure on Articulation. Languages 2025, 10, 298. https://doi.org/10.3390/languages10120298

AMA Style

Erickson DM. The C/D Model and the Effect of Prosodic Structure on Articulation. Languages. 2025; 10(12):298. https://doi.org/10.3390/languages10120298

Chicago/Turabian Style

Erickson, Donna Mae. 2025. "The C/D Model and the Effect of Prosodic Structure on Articulation" Languages 10, no. 12: 298. https://doi.org/10.3390/languages10120298

APA Style

Erickson, D. M. (2025). The C/D Model and the Effect of Prosodic Structure on Articulation. Languages, 10(12), 298. https://doi.org/10.3390/languages10120298

Article Metrics

Back to TopTop