1. Introduction
Prosody plays an essential role in conveying the meaning of an utterance. Speakers use multiple prosodic cues to highlight important information in the phonetic substance (
Ladd 2012;
Roessig and Mücke 2019). Prosodic highlighting involves categorical and gradient changes in intonation and articulation, while on the intonational level, pitch accent placement and pitch modulation are important factors (
Sluijter and van Heuven 1995;
Grice et al. 2005). Temporal and spatial changes in the oral vocal tract are related to the segmental level of consonant and vowel production that can increase sonority features and paradigmatic contrasts in prominent syllables (
De Jong 1995;
Beckman et al. 1992). Prosodic prominence is a complex process that requires fine speech motor control. The ability to mark prominence in terms of fine modulations of the laryngeal and supra-laryngeal system may decrease due to the impact of aging or diseases as physiological and/or mental changes can be involved (
Thies et al. 2020;
Karlsson and Hartelius 2021).
In an aging society, it is not only the process of aging and its effects which play an important role, but also the prevalence of associated diseases, potentially allowing for the possibility to disentangle the effects of aging from the effects of diseases. Parkinson’s disease (PD) especially, as the second most common neurodegenerative disease (
De Lau and Breteler 2006), is a disease that continues to worsen with age (
Kalia and Lang 2015). Aging, as well as PD, are known to impact general motor performance, speech production as well as cognitive-linguistic skills (
Levelt and Meyer 2000;
Ferreira and Swets 2002;
Ketcham and Stelmach 2004;
Smith and Caplan 2018). As the process of prosodic prominence marking requires fine speech motor control as well as sufficient cognitive skills, this study explores the role of prosodic modifications related to aging on the one hand and PD on the other.
The present study focuses on the relations of temporal and spatial marking of prominence in different speaker groups: younger healthy speakers, older healthy speakers, and speakers with PD. The two latter groups are associated with certain limitations due to the impact of aging and/or disease. Acoustic and articulatory data were collected by using electromagnetic articulography to analyze temporal and spatial strategies linked to prominence marking.
1.1. Prominence Marking and Information Structure
Prominence marking is a strategy to let certain parts of an utterance stand out compared to others. This highlighting process requires changes in intonation and articulation and can take place either between unaccented and accented words (across accentuation: non-prominent vs. prominent, e.g., background vs. broad focus,) or between accented words with different degrees of prominence (within accentuation: e.g., broad focus vs. contrastive focus).
Marking prominence on the intonational level involves the modulation of pitch (
Sluijter and van Heuven 1995). According to the autosegmental-metrical model of intonation, changes in pitch movement generate different tones on the perceptual level which are classified into pitch accents. Pitch accent placement (i.e., the choice of a pitch accent type) and pitch modulation influence the degree of prominence (
Sluijter and van Heuven 1995;
Grice et al. 2005). Rising pitch accents are perceived as more prominent than falling pitch accents (categorical changes;
Baumann and Röhr 2015). Furthermore, later and higher F0 peaks are also perceived as more prominent within the same rising pitch accent category (
gradient changes, Roessig 2021).
Articulatory modifications are related to two principles: sonority expansion and hyper-articulation. For enhancing sonority, the degree of opening of the oral cavity increases to allow for more radiation of acoustic energy from the mouth in accented syllables (
Harrington et al. 2000;
Cho 2005,
2006). In addition, longer vowel durations can increase sonority on the perceptual level as well (
Beckman et al. 1992). The strategy of hyper-articulation increases the paradigmatic contrasts between syllables by changing vocal tract configurations towards a more distinct articulation (
De Jong 1995;
Lindblom 1990). Therefore, place features of vowels are enhanced, leading to more peripheral formant frequencies. For example, the front vowel /i/ is articulated with a more fronted tongue position, while the low vowel /a/ is articulated with a more lowered tongue position.
Prominence marking can be captured on the acoustic level (e.g., acoustic durations, vowel formants;
Kügler 2008;
Baumann et al. 2007) and the articulatory level (e.g., duration and displacement of articulatory movements). A previous study investigated tongue body movements of healthy German speakers and stated that tongue positions and velocities systematically change with prominence across and also within accentuation (
Pagel et al. 2021;
Roessig and Mücke 2019). Vocalic movements are adjusted not only in the vertical dimension, e.g., greater jaw opening for /a/, but also in the horizontal domain, e.g., retraction of the tongue, to achieve a vocalic target on the periphery. The results are in line with the lip aperture measured during the production of vowels for different focus conditions in German and English, revealing a greater degree of lip opening from unaccented to accented syllables (across accentuation) and from broad to narrow focus and from narrow to contrastive focus (within accentuation;
Mücke and Grice 2014;
Krivokapić et al. 2017). All these types of adjustments regarding the lip and tongue movements were gradient in nature but systematically encoded different degrees of phrasal prominence during the communication process.
Prosodic prominence indicates information structure, such as focus marking (
Lambrecht 1994). In those cases, the modulation of speech parameters changes on a continuum from not prominent to most prominent. For this study, three focus types are relevant: (i) background, (ii) broad focus and (iii) contrastive focus. The realization of focus types is dependent on the communicative context. While given information is classified as less important and thus not highlighted on the surface production (not prominent), new or less accessible information is made prominent. Examples of the three focus types investigated in this study are the following:
- (1)
Background (girl’s name is already given, not accented):
Q: Hat die Schwester der Mila gewunken? (Has the sister waved to Mila?)
A: Die OMA hat der Mila gewunken. (The grandmother waved to Mila.)
- (2)
Broad focus (whole answer is of interest, girl’s name is new, accented):
Q: Was ist passiert? (What happened?)
A: Die Oma hat der Mila gewunken. (The grandmother waved to Mila.)
- (3)
Contrastive focus (girl’s name is new and the name is corrected, accented):
Q: Hat der Opa der Luna gewunken? (Has the grandfather waved to Luna?)
A: Der Opa hat der MILA gewunken. (The grandfather waved to Mila.)
In the given examples, the constituent of interest is always the girl’s name ‘Mila’ (underlined). Words which are expected to be accented (prominent) are highlighted in bold (examples 1–3). Typically, words in background position are realized as unaccented without prosodic adjustments. In the present study, broad focus refers to a unit larger than just a word in which two constituents receive prominence (cf. ‘grandmother’ and ‘Mila’), as the whole answer provides new information to the question raised. Contrastive focus is restricted to prominence on a single word or syllable. As contrastive focus is also known as corrective focus, it is used to correct the previously introduced constituent (cf. ‘Luna’ vs. ‘Mila’).
The distinction between unaccented (background) and accented (broad focus) words is what is considered as
across accentuation. In cases where speakers encode prosodic prominence to differentiate between broad focus and contrastive focus, this is considered as marking different degrees of contrast, i.e.,
within accentuation. In general, prosodic adjustments and therefore the degree of prominence increases from background to broad focus and further to contrastive focus condition (
Baumann et al. 2007;
Hermes et al. 2008;
Roessig and Mücke 2019).
1.2. Aging and Speech
The aging process can involve changes in mental and physical skills. At the physiological level, age-induced changes affect the central nervous system, the (musculo)-skeletal system, and the cardiovascular system, leading to deficits in movement and posture. This is further accompanied by a loss of flexibility and muscular strength and can result in smaller and slowed down movements as well as affected initiation and execution. Therefore, deficits of gross motor control can arise. Previous studies report on prolonged limb movements and a reduction of maximum velocities (
Seidler et al. 2002;
Ketcham and Stelmach 2004). Furthermore, it has been shown that movement profiles (symmetry of acceleration and deceleration phases) are also affected by age, leading to asymmetrical movement patterns (prolongation of deceleration in movements). While younger individuals perform gross motor control movements with a rather symmetrical pattern of acceleration and deceleration phases to achieve the target of movements, prolonged deceleration phases have been reported for older individuals revealing an asymmetrical pattern (
Brown 1996). A recent paper confirms that older adults perform with slower movements (
Kornatz et al. 2021); it states further that older adults more often use submovements for goal-directed pointing movements indicating less accurate movement patterns.
However, age-related effects are not restricted to gross motor control, as they also appear on the level of speech motor control. A reduced speech tempo was reported in the literature accompanied by reduced coarticulation (
Hermes et al. 2018;
Amerman and Parnell 1992;
Bourbon and Hermes 2021;
D’Alessandro et al. 2020;
Mücke et al. 2021). Two articulatory studies with a small sample size reported slower movements of the tongue body in older speakers compared to younger ones, especially during vowel production (
Hermes et al. 2018;
Mücke et al. 2021). Furthermore, they were able to show a similar pattern in the movement profiles as it has been shown for gross motor control: prolonged deceleration phases for the respective vocalic tongue body movements (asymmetrical pattern). In prominent syllables, the deceleration phase considerably increases in the production patterns of older speakers. This was especially the case for high vowels. The authors assumed that this might be a compensatory strategy for a decrease in sensory feedback in the older speakers. Moreover, regarding prominence marking there is preliminary evidence that older speakers do modulate durational properties across and within accentuation while spatial modulations were less clear (
Mücke et al. 2021).
1.3. Parkinson’s Disease and Speech
Patients with PD suffer from a neurodegenerative disorder which affects motor and non-motor functions (
Kalia and Lang 2015). Early motor signs are bradykinesia (slowness of movements), rigidity (stiffness of muscles) and a resting tremor. Axial symptoms, such as postural and gait impairment, speech problems (hypokinetic dysarthria) and dysphagia are manifested in later stages of the disease. The speech disorder results in less intelligible and unnatural speech evoked by a reduced modulation of pitch and intensity, slower and unprecise articulation as well as an overall reduced articulation space (
Duffy 2019). In a study on speakers with PD, it was shown that the size of the vowel space is related to motor skills, as a higher motor impairment correlated with a reduction and centralization of the vowel space (
Thies et al. 2020). Conversely, improved motor function induced by medication leads to an improved speech motor performance (
Thies et al. 2021). Therefore, a connection between gross motor skills and speech motor skills cannot be denied.
With regard to prominence marking, previous studies provided evidence that speakers with PD are indeed able to modulate relevant speech parameters, such as pitch, intensity and durational properties (
Thies et al. 2020,
2021). Their strategies of prominence marking on the acoustic level did not differ compared to healthy control speakers. However, when looking at the vowel production, a recent study showed that speakers with PD modulate durational and spatial properties of the vocalic tongue body movement only across accentuation (e.g., background vs. broad focus). Modulations distinguishing different degrees of prominence, i.e., within accentuation (e.g., broad focus vs. contrastive focus), were not found (
Thies et al. 2021). Due to the overall reduced articulation space, spatial modulations seem to be restricted in amplitude or space in speakers with PD. This causes problems in which speakers with PD are no longer able to make fine-grained distinctions. In a study by
Pell et al. (
2006), it was reported that contrastive focus produced by patients with PD is less accurately recognized by listeners. Thus, a clear differentiation within accentuation, such as broad focus vs. contrastive focus, is no longer possible. In addition, a previous articulatory study determines heterogeneous pattern on the articulatory level for speakers with PD ranging from hyper- to hypo-articulation within and across speakers (
Fivela et al. 2014).
1.4. Compensation Strategies in Speech
Compensation mechanisms can appear in speech production depending on external and/or internal factors. The process of speech production aims to produce intelligible speech output that is understood by the listeners. To achieve this auditory goal, speakers play along a continuum of hyper-articulation on the one side, and hypo-articulation on the other side (
Lindblom 1990). With regard to Lindblom’s H&H theory, a speaker wants to be understood in the best possible way, but at the same time wants to make a minimum of effort.
Nevertheless, speakers adapt to new requirements driven by internal or external factors to maintain goal-directed speech production and thus accept investing more articulatory energy if needed. For example, in a noisy environment, speakers speak louder in order to be understood (
Folk and Schiel 2011). In a study by
Brunner et al. (
2006) the palate shape of speakers was changed for two weeks by a prothesis. Speakers adapted to the new palate shape by lowering and retracting the tongue to produce intelligible speech output. As stated above, older speakers produce longer deceleration phases and more submovements in terms of multiple velocity peaks compared to younger speakers. This can be interpreted as a compensatory strategy to counteract age effects on speech planning so that older speakers can reach the articulatory target at the right time (
Hermes et al. 2018;
Mücke et al. 2021).
There are different stages of adaptation (
Brunner et al. 2006) that arise, the longer speakers have to deal with the new requirements; this can be related to speakers who have to adapt to new physical conditions as they get older or develop a disease: they have to learn to deal with possible deviations in order to produce targeted speech movements. Compensation related to underlying articulatory movements does not necessarily mean that the speech output is different. Having the system of motor equivalence in mind, this emphasizes that articulatory goals can be achieved in different ways (
Perrier and Fuchs 2015).
In the present study, we compare the flexibility of the prosodic system to mark prominence between younger speakers, older speakers, and speakers with PD. Therefore, we concentrate on temporal and spatial properties of the acoustic and articulatory vowel patterns. In prominent positions, we expect hyperarticulated vowels in terms of longer and larger tongue movement to signal prominence; however, speakers with PD are expected to show a reduced vowel space: Do they compensate for the reduced vowel space by adjusting the temporal domain to increase prominence? The same interplay of temporal and spatial modifications for prosodic modulations is investigated with respect to aging: How do older speakers express the flexibility of the prosodic system when abilities of the speech motor systems are changed?
2. Method
2.1. Participants
For this exploratory study, a small sample size of native German speakers was recorded (12 speakers in total, sex balanced,
Table 1). This study analyzes strategies of prominence marking in German across three different groups: 4 younger healthy speakers (25 ± 3 years), 4 older healthy speakers (76 ± 4 years) and 4 speakers with PD (72 ± 5 years). Patients with PD were diagnosed with PD 12 (± 7) years prior to the study and were only treated with medications. None of the patients have had a deep brain stimulation implanted or a pump. The data of the speakers with PD were assessed in medication-OFF condition (after abstaining 12 hours from PD medication) to capture the pure status of the disease without treatment effects. To determine the motor ability, the third part of the Unified Parkinson’s Disease Scale (UPDRS,
Goetz et al. 2008) was used. The mean UPDRS III score of 34 (± 6) indicates a moderate motor impairment (
Martínez-Martín et al. 2015). All patients with PD showed mild signs of hypokinetic dysarthria. None of the participants had clinically relevant signs of cognitive impairment which were assessed with the Mini-Mental State Examination (
Arevalo-Rodriguez et al. 2015).
The study was approved by the local ethics committee. Informed consent was obtained from all subjects involved in the study prior to their participation.
2.2. Speech Material and Recordings
Acoustic and articulatory speech recordings were carried out by using electromagnetic articulography (AG 501, Carstens Medizinelektronik GmbH, Bovenden, Germany). The acoustic signal was captured using a condenser microphone headset. The acoustic signal was recorded at 44.1 kHz/16 bit. To capture kinematic data, sensors were placed on the (1) lower lip, (2) upper lip, (3) tongue body and (4) tongue tip. The tongue sensors were placed approximately 1 cm and 4 cm from the beginning of tongue tip. Reference sensors were placed behind the ears and on the nose ridge for head correction.
Speech material was elicited with a game-like question–answer scenario and presented via a computer screen (
Figure 1,
Appendix A). The questions were presented auditorily, while the answers were produced by the participants (Example 1–3). Target words were disyllabic girl names with C
1V
1.C
2V
2-structure and with word stress on the first syllable (
Table 2), which were embedded in a predefined sentence structure, such as: ‘Die Schwester hat der Mela gewunken.’ (Engl. “The sister has waved to Mela.”). The vowel V1 was one of the five peripheral vowels in German /i:, e:, a:, o:, u:/. Target words were collected in three different focus conditions associated with different degrees of prominence: background (target word in unaccented condition), broad focus, and contrastive focus (target words in accented condition).
In total, 360 tokens went into the analysis (10 target words × 3 focus conditions × 4 speakers × 3 groups). No filler items were used, and no repetitions were made in order to not prolong the duration of the experiment with regard to the condition of the patients. Only utterances that were produced incorrectly were repeated. A test phase was included at the beginning of the experiment. During this phase, all target words were produced in isolation by the participants (plus three practice trials).
2.3. Data Processing and Measurements
The speech data was displayed and annotated using the EMU-webApp (
Winkelmann et al. 2017). Adjustments related to prominence marking were investigated based on acoustic and articulatory variables corresponding to vowel production in the target syllable, as the stressed vowel V1 is the main domain of prominence marking. It has been shown that the vowel is strongly affected by prominence modulations, while the role of consonants due to accent marking is less clear (
Fougeron and Keating 1997;
Cho and McQueen 2005;
Mücke 2017;
Cho 2006).
Figure 2 provides an annotation scheme for articulatory landmarks (
Figure 2: trajectory, top) and acoustic landmarks (
Figure 2: segmental string, below). In the acoustic dimension, target words, stressed syllables (C
1V
1) and vowels (V1) were determined according to the speech waveform and the wideband spectrogram by inspection of the higher formant structure. In the articulatory dimension, the following landmarks of the vocalic tongue body movement were identified: start of the movement (onset, onsV1) and target of the movement (targV1). The landmarks were annotated in the vertical plane by using zero-crossings in the respective velocity trace. The maximum velocity (pvelV1) was identified by using zero-crossings in the acceleration trace.
On the acoustic level, the following variables were computed:
Acoustic vowel duration (ms): Temporal interval between the start of the first vowel V1 and the end of the vowel V1 in the stressed C1V1 syllable of the target word. Longer vowels are associated with an increase in prominence.
Vowel space area: The mean vowel formants (F1, F2) taken from the vocalic segment V1 (/i:, e:, a:, o:, u:/) were calculated for analyzing the vowel space. Within the central 25 ms of the vocalic segment, a value was taken every 6.25 ms. The average of all four values was calculated for F1 and F2, respectively. An increase of the vowel space in terms of more peripheral vowels is associated with an increase in prominence marking (hyper-articulation).
Vowel Articulation Index (VAI): Based on the formants F1 and F2 of the V1 vowels /i:, a:, u:/, the VAI was calculated using the following formula (
Sapir et al. 2010):
VAI = (F2_i + F1_a)/(F1_i + F1_u + F2_u + F2_a). Higher values represent an enhancement of the vowel space as expected for prominence marking.
On the articulatory level, the following landmarks were computed to capture the vocalic tongue body movement:
Vocalic movement duration (ms): Temporal interval between the onset of the vocalic movement until the maximum target (onsV1 to targV1). An increase in movement duration is associated with an increase in prominence.
Symmetry ratio: This is the ratio of the deceleration phase divided by the acceleration phase. The temporal interval between the start of the movement to the maximum velocity (onsV1 to pvelV1) corresponds to the acceleration phase of the vocalic movement and the interval from the maximum velocity to the target of the movement (pvelV1 to targV1) corresponds to the deceleration phase of the vocalic movement. An increase in the symmetry ratio indicates a prolongation of the deceleration phase.
Tongue body position in the vertical and horizontal domain: The mean position of the tongue body was calculated using the
y-position and the
x-position of the movement trajectory. Per domain, all position values during the first half (50%) of the acoustic vowel segment V1 were averaged, as depicted in
Figure 2 (
Roessig 2021;
Pagel et al. 2021). All values were z-transformed for each speaker and vowel. Under prominence, more extreme tongue positions are expected (hyper-articulation) to signal a more distinct vowel articulation corresponding to a more peripheral vowel space.
To capture the relationship between the acoustic speech output and underlying articulatory movements, coordination patterns were calculated for the onset (onsV1) and the target (targV1) of the vocalic tongue body movement with respect to the acoustic syllable properties (cf.
Thies et al. 2021). Since it has been reported that prominence affects the syllable-internal coordination, variability of the coordination patterns between the initiation and the target achievement of the vocalic tongue movement and the acoustic syllable boundaries is expected. While the vocalic target is expected to be achieved consistently, variation in terms of movement initiation is suspected (
Thies et al. 2021).
Onset V1 to start of acoustic syllable (%): Interval between the onset of the vocalic tongue body movement (onsV1) and the left acoustic syllable boundary (start) divided by the acoustic syllable duration of the stressed CV syllable. Negative values indicate that the articulatory vocalic movement is initiated before the start of the acoustic syllable. The smaller the values, the earlier the movement is initiated before the start of the acoustic syllable. Under prominence, the start of the vocalic movement and the start of the acoustic syllable boundary are expected to be timed more tightly (leading to smaller temporal lags).
Target V1 to start of acoustic syllable (%): Interval between the target of vocalic tongue body movement (targV1) and the left acoustic syllable boundary (start) divided by the acoustic syllable duration of the stressed CV syllable. Positive values indicate an achievement within the acoustic syllable. The higher the values, the later the target is achieved within the acoustic syllable. Under prominence, the target achievement of the vocalic movement with respect to the start of the acoustic syllable boundary is supposed to remain unchanged.
3. Results
3.1. Temporal Domain
The data presentation is divided into a temporal and spatial analysis. The temporal measurements are reported first and presented in
Figure 3. These include (a) acoustic vowel duration, (b) vocalic movement duration and (c) symmetry ratio. Comparisons are made between focus conditions and speaker groups. As this study is of exploratory nature with a small sample size, only descriptive statistics are applied.
Acoustic vowel duration (ms): Figure 3a presents the acoustic vowel durations separately for the speaker groups and focus conditions. Comparing values across the groups, the longest vowel durations are produced by the older speakers (background: 130 ± 33 ms, broad focus: 156 ± 47 ms, contrastive focus: 166 ± 57 ms), while the shortest durations are found in the young speaker group (background: 109 ± 24 ms, broad focus: 119 ± 26 ms, contrastive focus: 133 ± 28 ms). Speakers with PD produce shorter durations than older speakers (background: 138 ± 35 ms, broad focus: 152 ± 35 ms, contrastive focus: 155 ± 30 ms). Moreover, it is noticeable that the data of the older speakers show a higher variability compared to the younger speakers. There is a systematic increase of vowel duration under prominence from background to broad focus (across accentuation) and from broad focus to contrastive focus (within accentuation) for both, the younger and the older speaker group. The data of the speakers with PD, indicate a reduced range of prominence modulation. While the acoustic vowel duration increases from unaccentuated to accented condition (14 ms from background to broad focus), there are no clear adjustments within accentuation (3 ms from broad focus to contrastive focus).
Vocalic movement duration (ms): The duration of the tongue body movement for producing the vowel V1 increases with prominence in all groups (
Figure 3b). The data indicate that movement durations are shortest and least variable in the younger speaker group (background: 168 ± 25 ms, broad focus: 179 ± 24 ms, contrastive focus: 187 ± 28 ms), while older speakers prolong movement durations and show more variability (background: 187 ± 39 ms, broad focus: 206 ± 52 ms, contrastive focus: 221 ± 66 ms). Speakers with PD present with the longest movement durations and also with a high variability (background: 203 ± 45 ms, broad focus: 226 ± 35 ms, contrastive focus: 215 ± 50 ms). Note, that the speakers with PD increase the duration of the vocalic movement from background to broad focus, but surprisingly decrease the duration from broad focus to contrastive focus. This is not the case for younger and older speakers, who systematically produce longer vocalic movements with increasing prominence.
Symmetry ratio (dec/acc): The symmetry (ratio of deceleration phase to acceleration phase) of the tongue body movement does not change with prominence (
Figure 3c). However, group differences can be detected. Younger speakers produce rather symmetrical profiles as the acceleration and the deceleration phases are of the same length (ratio = 1.1). In contrast, older speakers and speakers with PD show more asymmetrical patterns, implying longer deceleration phases (older speakers: ratio = 1.6, speakers with PD: ratio = 1.7). The variability is higher in older speakers and speakers with PD when being compared to younger speakers. The highest variability in the symmetry ratio can be found in background condition for speakers with PD.
Another metric within the temporal domain we investigated is the relationship between the acoustic and articulatory level of syllable production. Therefore, the onset and the target of the vocalic tongue movement with respect to the start of the acoustic syllable as a ratio of syllable duration was calculated (
Table 3). The data indicate that the target of the tongue body movement is on average achieved at 66% within the acoustic syllable independent of focus condition and speaker group.
In contrast, the initiation of the vocalic tongue body movement differs across speaker groups and focus conditions. As expected, the vocalic tongue movement is initiated before the acoustic syllable onset in all conditions (negative values). Under prominence, the onset of the vocalic movement systematically shifts to the right, closer to the acoustic start of the stressed syllable. Therefore, the onset of the vocalic tongue body movement and the start of the acoustic target syllable are more tightly timed with respect to each other under prominence. Note that in the articulatory domain, consonants are superimposed by the vowel-to-vowel articulation (
Öhman 1966). The onset of the movement for V1 is at the same time the offset of the movement for the preceding vowel. According to this, the target syllable and the preceding syllable are timed closer with respect to each other when the target syllable is accented. These adjustments of coordination patterns are systematic across speaker groups. Comparing the speaker groups, it becomes apparent that speakers with PD initiate the vocalic movement earlier with respect to the acoustic syllable boundary than the healthy younger speakers, thus indicating a looser coordination between the two syllables for the speakers with PD. The opposite is the case for the older speakers; they show an even tighter coordination between the two syllables compared to the younger speakers, i.e., the onset of the vocalic movement is shifted to the right towards the start of the acoustic syllable. To summarize, the onset of the tongue body movement is initiated earliest in speakers with PD, followed by younger speakers and latest initiated in older speakers.
3.2. Spatial Domain
Measures within the spatial domain display (i) the acoustic vowel space area (
Figure 4), (ii) the vowel articulation index (
Table 4) and (iii) the mean position of the tongue body on the vertical and the horizontal plane (
Figure 5 and
Figure 6) related to the five vowels under investigation, separately for each speaker group and focus condition.
Acoustic vowel space area: Figure 4 displays the modulation of the acoustic vowel space area on the basis of the formant frequencies F1 and F2 (see
Appendix B for values of F1 and F2). The plots show from left to right the three focus conditions (background, broad focus and contrastive focus) for the three speaker groups. While younger speakers perform with the largest vowel space (
Figure 4, grey area), speakers with PD have the smallest vowel space across all focus conditions (
Figure 4, red area). Furthermore, the overall vowel space appears to be strongly retracted in the older speaker group (
Figure 4, blue area) in comparison to the younger speaker group. In addition, some retracted positions for /u:/ and /o:/ can be observed in the productions of speakers with PD compared to younger speakers which are not as strongly retracted as in the older speakers.
Table 4 displays the VAI per speaker group (younger, older, and speakers with PD) and illustrates the different adjustments due to prominence marking per group. The VAI indicates the highest values for younger speakers and the lowest VAI for speakers with PD across all conditions. For prominence marking, the VAI increases in younger speakers the most, as the range of the VAI from background to contrastive focus condition is higher in younger speakers (difference of 0.11) compared to older speakers (difference of 0.06). In contrast to the healthy speaker groups, the VAI of speakers with PD does not change under prominence (difference of 0.03).
Tongue body position: Figure 5 and
Figure 6 present the vertical and horizontal tongue body positions taken from the articulatory trajectories for the different vowel types. On the vertical axis, low values indicate a lowering of the vocalic target during the vowel production and high values indicate a raising of the tongue. On the horizontal axis, lower values indicate a retraction of the tongue and higher values indicate a fronting of the tongue. Note that the positional plots of the tongue body in the articulatory dimension resemble the positions of F1 and F2 in acoustic formant charts. Furthermore, the low vowel /a:/ is produced with a central tongue position in the horizontal plane, i.e., /a:/ is neither specified as a front vowel nor as back vowel in Standard German.
Front vowels /i:, e:/: The overall tongue position for the production of the vowel /i:/ and /e:/ is much more fronted and raised in the healthy speaker groups compared to the speakers with PD. Comparing older and younger speakers, the tongue is slightly more raised in older speakers.
Focusing on prominence marking, younger speakers present a more fronted and raised tongue position for /i:/ and /e:/ in contrastive focus condition. Older speakers differentiate between unaccented and accented conditions by moving the tongue more to the front, but note that the conditions partly overlap. Speakers with PD adjust the articulatory target only for /i:/ (across accentuation), whereas the modulation for /e:/ is not systematic.
Low vowel /a:/: For producing the vowel /a:/, a lower tongue position is found in younger speakers compared to older speakers. Tongue positions of speakers with PD are further retracted in comparison to older speakers but at the same tongue height.
Prominence modulations are visible through a systematic lowering of the tongue in younger speakers (within accentuation) and older speakers (across and within accentuation). Speakers with PD do not lower the tongue under prominence, but they aim to retract the tongue for /a:/ in accented syllables; however, there is no systematic mechanism for prominence marking detectable, since the backing of the tongue appears to be stronger in broad focus than in contrastive focus.
Back vowels /u:, o:/: For the corner vowel /u:/, the younger speakers produce the highest tongue positions. The tongue positions for /u:/ are lower in older speakers compared to younger speakers. The lowest and most retracted tongue positions for /u:/ are observable in the productions of the speakers with PD. While the tongue positions for /o:/ are comparable between the younger and older speakers, the speakers with PD produce /o:/ with considerably lowered and retracted tongue positions.
To encode prominence, younger speakers show a slightly fronting of the tongue body for /u:/ to differentiate between accented and non-accented conditions, while the older speakers are systematically retracting the tongue across and within accentuation (note that focus conditions in younger speakers overlap). Speakers with PD do not modulate the vocalic target for the corner vowel /u:/ at all. For /o:/, both the younger and the older speaker group retract the tongue under prominence. While speakers with PD do not encode different degrees of prominence for the corner vowel /u:/, they produce a fronting and lowering for /o:/ under prominence.
To conclude, prominence modulations and group differences are visible in the current data set. Speakers with PD stand out with an overall smaller articulation space in both dimensions, horizontal and vertical, revealing smaller tongue body movements for the corner vowels. Thus, prominence modulation of the tongue body is reduced and less systematic. It is noticeable that speakers with PD mark prominence by a fronting of the tongue during the production of the back vowel /o:/ and a retracted tongue for the low vowel /a:/, which is indeed the opposite direction compared to younger and older speakers. Moreover, the healthy speaker groups differ across each other as the older speakers try to produce more often systematic contrasts across and within accentuation, whereas the younger speakers often differentiate only between unaccented and accented conditions (across accentuation); however, the younger speakers show a more fronted and a larger vowel space in terms of articulatory tongue positions than the older speakers.
4. Discussion
This exploratory study investigates age-related and disease-related changes in prominence marking. A small data set of three groups (younger speakers, older speakers, speakers with PD) with four speakers each were analyzed and compared in order to examine strategies of modulating speech parameters in the temporal and spatial domain. The analysis has been applied to changes in vowel production on the acoustic and the articulatory level.
Possible differences in prominence marking strategies between the speaker groups could be found on the basis of this small data set, leading to the assumption that age and disease affect speech production, and prominence marking in particular. The effects of age and disease on speech point towards the development of certain strategies to compensate speech deficits. These compensatory strategies seem to be quite different comparing older speakers and speakers with PD. We cannot exclude that the observations are due to speaker-specific variabilities since our group sizes are rather small (which is indeed often the case for studies using electromagnetic articulography) but trends are clearly detectable. The group strategies will be discussed in more detail below.
Younger speaker group: We found adjustments in the temporal and spatial domain to signal prominence.
In the temporal domain, the younger speakers gradually increase acoustic vowel durations across accentuation (background to broad focus) and within accentuation (broad focus to contrastive focus). This was also reflected in the temporal properties of the underlying vocalic tongue movement, which were gradually increased under prominence. The younger speakers, as expected, show symmetrical profiles for the tongue movements in terms of acceleration and deceleration phases. The symmetry profile of the vocalic tongue movement is not affected by prominence. When investigating the coordination patterns of the vocalic tongue movement with respect to the acoustic syllable boundaries, our data reveal that prominence affects the initiation of the vocalic movement, while the target achievement remains stable. Under prominence, the vocalic tongue movement is initiated closer to the syllable onset, leading to a tighter timing pattern between articulatory vowel initiation and the start of the acoustic syllable for prominent constituents and therefore to a tighter timing between the accented syllable and the preceding one.
In the spatial domain, the younger speakers produced a more distinct vowel articulation under prominence. In terms of formant frequencies, F1 and F2, the acoustic vowel space systematically increases. This pattern is also reflected by the VAI and the maximum target positions of the underlying articulation of the tongue. They increase the vowel space across accentuation (background to broad focus) for back vowels /u:, o:/ and within accentuation (broad focus to contrastive focus) for front vowels and the low vowel /a:/. For highlighting important information, front vowels become more fronted and back vowels more lowered to encode contrastive focus. The results can be attributed to hyper-articulation strategies to enhance paradigmatic contrasts between the different vowel types and therefore to increase intelligibility (
De Jong 1995;
Harrington et al. 2000;
Cho 2005;
Roessig and Mücke 2019). The higher degree of the opening of the oral cavity during the production of the low vowel /a:/ can further be attributed to sonority expansion, allowing for more radiation of acoustic energy from the mouth during the production of prominent elements in an utterance. Sonority expansion enhances syntagmatic contrasts, i.e., between accented and unaccented syllables. Both hyper-articulation and sonority expansion are related to feature enhancement, triggered by the phonological system of a given language.
Taking all these adjustments together, our data may reveal that the younger speakers are rather flexible in encoding prominence on the phonetic surface. The strongest effects were found in the temporal domain, while spatial modifications in terms of hyperarticulated vowels are less consistent in the productions of the younger speaker group. The most spatial modulations were found in contrastive focus condition, a focus type that reveals the highest degree of prominence. With regard to articulatory effort, we conclude that young speakers articulate with a high degree of efficacy to express pragmatic meaning in terms of phonetic cues, both in the temporal and spatial domain; it seems that they avoid to spent more articulatory effort than necessary to encode prominence in the relevant task—however, the prosodic marking is rather systematically and appears to be balanced well.
Older speaker group: The older speaker group encodes prominence in the temporal and spatial domain systematically.
In the temporal domain, the older speakers (as the younger ones) gradually increase acoustic vowel durations across accentuation (background to broad focus) and within accentuation (broad focus to contrastive focus). This was reflected in the temporal properties of the underlying vocalic tongue movement, which were also gradually increased under prominence. However, they produce considerably longer acoustic vowel durations than the younger speaker group. This is also reflected in the articulatory domain by longer vocalic tongue movements. As in the younger speaker group, the symmetry profiles of the older speakers remain unchanged under prominence, i.e., longer deceleration phases are not used to mark prominence in both groups. In contrast to the younger speakers, the deceleration phases of the tongue body movement are longer than the acceleration phases in the older speakers, thus changing the symmetry profile of the articulatory movement. We assume that the longer deceleration phases in the older speakers can be attributed to compensatory strategies to maintain the goal-oriented articulation and to avoid possible problems with deficient sensory feedback (
Hermes et al. 2018;
Mücke et al. 2021). It might be also one of the reasons why older speakers are reported to show a decrease in coarticulation on the acoustic surface (
D’Alessandro et al. 2020). In addition, longer deceleration phases are interpreted as a sign of deviant speech patterns related to speech movement disorders, such as dysarthria (
Mücke et al. 2018;
Forrest et al. 1989). These changes within the temporal domain refer to a slowing down of the speech system with age. This is in line with inter alia
Amerman and Parnell (
1992) who also reported on slower speaking rates in older speakers compared to younger ones. Another aspect can be found in the coordination patterns between the onset of the vocalic tongue movement and the acoustic syllable boundaries. The overall coordination patterns are tighter in the older speakers than in the younger ones, leading to a relatively late initiation of the vocalic tongue body movement with respect to the target syllable. As in the younger speaker, the degree of prominence is reflected in these subtle timing patterns, leading to a tighter timing between the accented and the preceding syllable in the productions of the older speakers.
In the spatial domain, the data reveal a smaller and more retracted vowel space in older speakers compared to younger ones. The smaller vowel space is in line with the literature showing parallels to gross motor control with slower and smaller movement trajectories of the limbs (
Seidler et al. 2002;
Ketcham and Stelmach 2004). The older speakers systematically resize the vowel space in all directions to signal prominence by making use of the strategies of sonority expansion and hyper-articulation. Note that they may only try do differentiate across and within accentuation as focus conditions overlap sometimes. According to the VAI, the resizing of the vowel space seems to be lower than with the younger speakers. Moreover, the expansion of the vowel space is stronger and less symmetrical in the front-back dimension compared to the younger speakers, since the retraction is more pronounced than the fronting. In the open-closed dimension, the resizing of the vowel space includes a systematical lowering of the tongue in terms of vowel formants. These observations are reflected in the underlying articulatory tongue positions on the vertical and horizontal axis. In contrast to the younger speakers, we found a more retracted tongue position for /o:/ and /u:/ in prominent positions indicating hyperarticulated vowels. In addition, the lowering of the tongue for /a:/ from background to broad focus and further to contrastive focus seems to be stronger in the older speaker group compared to younger speakers, indicating the use of sonority expansion. Overall, modulations due to prominence seem to be stronger and more systematic across but also within accentuation for the older speakers than for the younger speakers.
We conclude that the older speaker group sufficiently encodes prominence across accentuation and within accentuation in the temporal but also spatial domain. In contrast to younger speakers, strategies of prominence marking in older speakers require higher articulatory effort as they use the temporal and the spatial domain. Therefore, the speech production is less efficient (high-cost behavior) compared to younger speakers.
Speaker group with PD: The speakers with PD do also signal prominence, but they differ in their strategies from both healthy younger and older speakers.
In the temporal domain, as the other speaker groups, speakers with PD increase acoustic and articulatory vowel durations across accentuation (background and broad focus) to mark prominence, but adjustments within accentuation (broad focus and contrastive focus) are less clear. When being compared to older speakers, PD speakers produce shorter vowel durations on the acoustic level, but interestingly longer durations of vocalic tongue body trajectories on the articulatory level. This can be explained by looking in more detail at the onset coordination of the vocalic tongue movement relative to the start of the acoustic syllable. While the vocalic target is consistently reached at 66% of the acoustic syllable in all speaker groups, the speakers with PD initiated the vocalic tongue body movement much earlier than the older speakers. This coordination pattern between the target syllable and the preceding syllable is less tight than in the healthy speaker groups and thus leads to longer vocalic tongue durations in speakers with PD in the underlying articulatory patterns. Due to the early vowel initiation and the looser coordination between the target syllable and the adjacent syllable, and especially due to the stable achievement of the target, we do not find longer vowel segments on the acoustic surface. As speech production is guided toward auditory goals, the acoustic output can remain the same, while the way to produce the same speech output may differ. We assume that speakers with PD change their articulatory coordination and speech planning to counteract effects of the disease on their speech. In addition, longer deceleration phases of the vocalic tongue body movements (asymmetric movement profiles) were produced by the speakers with PD, which were comparable to the older speakers (while the younger speakers produced symmetrical movement profiles). As mentioned above, this supports the goal-oriented articulation and leads to the target being reached at the right time.
In the spatial domain, the acoustic vowel space of speakers with PD is smaller and more centralized compared to both healthy speaker groups. This is reflected in the VAI but also in the underlying articulatory patterns in terms of the vertical and horizontal tongue positions, being less peripheral than in the healthy speaker groups. A reduced vowel space is already reported in the literature and is a symptom of hypokinetic dysarthria (
Duffy 2019;
Skodda et al. 2011;
Skodda et al. 2012;
Thies et al. 2020). This centralization can cause problems as for instance the vowel /o:/ is produced with a much lower tongue position in speakers with PD, which can lead to a merging of vowel categories, a decrease in perceptual salience and less intelligible speech. The more the tongue is lowered, the more likely it is that an /o:/ will sound like an /a:/. As previously shown by
Thies et al. (
2020), these two vowel categories, the back vowel /o:/ and the low vowel /a:/ were more centralized and partly overlapped (
Thies et al. 2020). In addition, it is interesting to note that tongue body positions were strongly adjusted for the vowels /i:, o:/ under prominence. Some prosodic adjustments were different from the healthy speaker groups, as the open vowel /a:/ is retracted (not lowered) and /o:/ is lowered and fronted (and not retracted). The overall vowel space is not further retracted to signal prominence in speakers with PD as it is in older speakers. This suggests that compensation strategies differ between older speakers and speakers with PD. One might expect a transition from younger to older speakers to speakers with PD, so that the vowel space becomes smaller and smaller, but also further and further retracted, but this is not the case in our data.
Aging and disease effects: Whereas younger speakers show a high degree of efficacy in marking prominence in the temporal and spatial domain across accentuation and within accentuation, older speakers and speakers with PD show deviant and less efficient speech patterns. Thus, the speech system seems to be affected by age and disease evoking a slowing down, less flexible articulators and a reduced vowel space. Differences in the use of temporal parameters between the groups are clearly visible. In older speakers and speakers with PD, strong temporal modifications (prolongations) are found with regard to acoustic durations, articulatory movements durations and deceleration phases. It is likely, that the durational adjustments are compensatory strategies that repair for deficits in the spatial domain (smaller vowel size) and/or for coordination deficits. While older speakers produce a tight coordination between the target syllable and the preceding syllable, the coordination is rather loose for speakers with PD. This means that speakers with PD produce long vocalic movements in the articulatory domain (since the vowel is initiated rather early). Moreover, the vowel space decreases with age and decreases even further with PD. Whereas older speakers most often use spatial adjustments systematically across and within accentuation, speakers with PD cannot resize the vowel space in a strategic and sufficient way under prominence. The older speakers systematically retract the vowel space under prominence for back vowels to encode prominence and to compensate for less abilities in adjusting the tongue body for front vowels. However, the adjustments are less efficient for the speakers with PD since they were not able to resize the corner vowel in an appropriate way. The best example is probably the low vowel /a:/, which is not further lowered but retracted under prominence.
We interpret the modifications of the temporal parameters as a developed compensatory strategy to deal with the spatial deficits and coordination problems between syllables. The more the tongue system is affected by age or disease, the stronger the temporal compensations. Strong temporal modifications are a compensatory strategy for the smaller and retracted vowel space. The compensatory strategies developed in the different speaker groups show a complex behavior of the speech system with different solutions in speech motor control patterns. We conclude that there seems to be no gradient transition when comparing the limitations of the lingual system from aging to Parkinson’s disease.