A Longitudinal Study of Speech Acoustics in Older French Females: Analysis of the Filler Particle euh across Utterance Positions

: Aging in speech production is a multidimensional process. Biological, cognitive, social, and communicative factors can change over time, stay relatively stable, or may even compensate for each other. In this longitudinal work, we focus on stability and change at the laryngeal and supralaryngeal levels in the discourse particle euh produced by 10 older French-speaking females at two times, 10 years apart. Recognizing the multiple discourse roles of euh , we divided out occurrences according to utterance position. We quantiﬁed the frequency of euh , and evaluated acoustic changes in formants, fundamental frequency, and voice quality across time and utterance position. Results showed that euh frequency was stable with age. The only acoustic measure that revealed an age effect was harmonics-to-noise ratio, showing less noise at older ages. Other measures mostly varied with utterance position, sometimes in interaction with age. Some voice quality changes could reﬂect laryngeal adjustments that provide for airﬂow conservation utterance-ﬁnally. The data suggest that aging effects may be evident in some prosodic positions (e.g., utterance-ﬁnal position), but not others (utterance-initial position). Thus, it is essential to consider the interactions among these factors in future work and not assume that vocal aging is evident throughout the signal.


Introduction
Human aging is a multidimensional process that impacts anatomy, physiology, linguistic properties, communication, and cognition. Work on vocal aging suggests that the timing of changes may vary across individuals (Goozée et al. 1998). For female speakers, authors have given considerable attention to the effects of menopause (see summary in Lenell et al. 2019). This raises the question of whether we can still trace vocal aging in older females (beyond their 60's) or if patterns are relatively stable. In connected speech, a voice may also have different characteristics depending on the analysis unit, i.e., what portion of the speech stream we are analyzing or listening to. In combination, physical, sociolinguistic, and prosodic variables may make it hard to tease apart the factors contributing to age-related changes or compensating for them. As examples, Smorenburg and Heeren (2020) observed that speaker specific information varied with phonological and syllabic context and Weirich (2012) found that physiological aspects in speech production were visible in unstressed syllables, but not in stressed ones. These findings indicate that speaker information in speech sounds is not necessarily the same across linguistic contexts and prosodic positions.
To assess stability and changes in speech and language use, we employ here a longitudinal design to control for the numerous variables that can complicate cross-sectional studies. Much of the work cited below was in fact cross-sectional (exceptions being Decoster and Debruyne 1997;Endres et al. 1971;Gerstenberg et al. 2018;Harrington et al. 2000;Keszler and Bóna 2019;Reubold et al. 2010;Russell et al. 1995). Cross-sectional designs have obvious practical advantages over longitudinal ones: They can be carried out in a relatively short time frame and do not suffer from issues of participant attrition. Comparing individuals across generations, as is necessary in a cross-sectional design, carries its own threats to validity, however. Along with simple random variation across participants, individual conditions of medical care, diet, and lifestyle factors such as smoking are not controlled, nor is the local impact of social and cultural changes that can affect linguistic and pragmatic variables. Thus, behavioral comparisons across generations may not necessarily lead to the same results as across the lifespans of individuals. We are focusing on a decade in the lifespan of the older generation (all of them aged 70+ at the second time point). As Fougeron et al. (2021) state, dynamics of vocal aging do not unfold evenly over the lifespan.
In addition, in contrast to most previous work, we focus on spontaneous speech, which yields greater ecological validity than read speech or sustained vowels. We specifically take acoustic measures from the filler particle euh in 10 older French females recorded 10 years apart. This particle has been widely investigated acoustically and occurs frequently in spontaneous speech (Candea 2000, p. 79), ensuring a high degree of inter-speaker comparability. While no empirical evidence was provided, Duez (2001) proposed that the acoustics of euh are "strongly linked to absolute physiological aspects of speech production" (p. 44). However, acoustic realizations may also depend on the position of euh in the utterance, i.e., prosody and contextual information may affect how the filler particle is realized (Shriberg 2001).
Our acoustic analysis covers formants (F1, F2) and fundamental frequency (f0), which have been widely studied, and voice quality, which has been assessed less frequently (but see the recent work by Fougeron et al. 2021;Karlsson and Hartelius 2021). All parameters can reflect physical effects of aging but may also be conditioned by prosody, specifically whether the filler occurs pre-pausally, post-pausally, or within a speech unit.

Filler Particles in Connected Speech
Filler particles are among the most frequent features in spontaneous speech, providing a convenient comparison across and within speakers. The literature on filler particles is quite broad, so we will first start with a definition of filler particles and then go into more details about their frequency, occurrence within discourse, and acoustic characteristics. Whenever possible, results from studies on aging will be introduced.

Definition
French euh can be categorized among non-lexical filler particles, defined by Belz (2021, p. 4). They have often been discussed in the context of pausing (Grosman et al. 2018). In this paper, we adopt 'filler particle' (FP) instead of other previously used terms such as hesitation or filled pause, which may carry negative connotations or fail to capture the multiple discourse functions of FPs. Ferreira and Bailey (2004) note that these particles (together with other phenomena) were almost ignored in linguistic research in the past but have received much more attention in the last decades. Filler particles have also been explored in various languages (see the comprehensive summary in Belz 2021, p. 14).

Filler Particles in Discourse: Frequency and Location
Euh is the most frequent filler in spoken French, and its discourse functions are multifaceted. Filler particles do not occur randomly. Rochet-Capellan and Fuchs (2013) did an analysis of fillers in German with respect to breath cycles. On average, 40% of all breathing cycles were realized with a filler particle, but results were highly speaker specific. If a filler occurred, again in 40% of cases, it was produced just after inhalation, at the beginning of speech. These results are congruent with previous work investigating fillers with respect to respiration (Schönle and Conrad 1985). In French, the filler particle euh is often preceded or followed by a silent pause, both cases occurring with the same frequency. However, Grosman et al. (2018) observed that pauses preceding euh were shorter than pauses following euh, which could indicate that euh + pause represents an example of 'disfluency'. Gósy and Silber-Varod (2021) recently observed a comparable duration for young (but not older) speakers of Hungarian. More generally, the durational difference between these two contexts for euh could suggest varying discourse functions. Euh can have different roles according to its place in the information structure and context (e.g., following connectors such as donc 'so' or mais 'but'; Morel and Danon-Boileau 1998, pp. 82-83). Shriberg (2001) showed that the use of filler particles can be related to a variety of ecological factors, including communicative context, syntactic structure, individual behavior, and personal factors such as sex along with the age of the speaker.
Empirical findings for the frequency of fillers in spontaneous speech are diverse. Some studies on aging have reported higher frequency and longer durations of filled and unfilled pauses with age (e.g., Oyer and Deal 1985;Horton et al. 2010 for uh up to the age of 68). Bolly et al. (2016) studied euh as one of nine linguistic phenomena of disfluency in the Valibel corpus of French. For the entire series, they found a positive correlation between disfluency counts and speaker age.
No change in filler frequency was reported in a longitudinal study of Hungarian speakers. Keszler and Bóna (2019) analyzed the occurrence of the filler particle at three successive speaker ages, around 60, 70, and 75+ years. The authors noted individual differences but could not find an overall age effect. Moreover, the normalized number of fillers also did not display any difference in comparison to younger Hungarian speakers. Stability in the frequency of fillers has additionally been found in cross-sectional studies, such as Searl et al. (2002) who analyzed speaker groups at 70, 80, 90, and even 100+ years old.
Finally, some studies find fewer fillers with older age (Gósy and Silber-Varod 2021;Maxim et al. 1994, p. 112). In a seven-year comparison of LangAge speakers (the same corpus used here) the normalized frequency of euh significantly decreased with age (Gerstenberg 2015) while individual variations were also found. Taschenberger et al. (2019) studied the filler in different age populations interacting with an interlocutor and with no background noise, non-speech background noise or background speech. While on average no changes in the frequency of fillers were found, under noisy conditions, fewer filler particles were produced in older age. Gall (2019, p. 142) obtained similar results, with a 10% decrease of the filler particle in German in interviews with early retirees recorded 12-14 years apart.
The very notion FPs as an instance of disfluency is challenged by the results of Bortfeld et al. (2001) who found different distributions for 'disfluency' types: repeats and restarts were sensitive to utterance length, whereas filler particles varied with conversational roles. The authors proposed that some FPs could reflect or support coordination between conversation partners and not represent hesitations or repairs. Along similar lines, Horton et al. (2010, p. 711) found a positive correlation in the frequency of uh with age, but a negative correlation with um. They suggested that the increased use of uh may be related to word-finding problems, whereas the decreasing use of um could be related to changes in sentence-planning strategies. To sum up, contemporary work suggests that FPs function in diverse ways, and do not necessarily correspond to slowing, disfluency, or cognitive decline but are rather a general property of spontaneous speech.

Acoustic Characteristics of Filler Particles
Various studies have been carried out to investigate the acoustic properties of filler particles, considering language and speaker specificity. In a crosslinguistic study, Candea et al. (2005) reported language specificities for formants, but similar patterns regarding duration and f0. The results for formant frequencies are in line with the findings of Belz (2020). Reviewing the literature, he summarized that formant characteristics of the Languages 2021, 6, 211 4 of 24 filler particle correspond to /@/ in English and /e/ in Spanish (see also Belz 2021, p. 17 for a comprehensive overview). His analysis showed that German fillers had formant values comparable to those of both /oe/ and /5/ in lexical words, which may be similar to the French filler euh. Vowels in filler particles also had larger variability in formant frequencies than the vowels in lexical words that were recorded as references. Finally, Karpiński (2013) reported significantly different formant values across Polish speaking participants, i.e., considerable cross-talker variability, including the expected male-female differences.
According to our knowledge, less is known about the acoustic realizations of filler particles in the older speaker population. Gósy et al. (2014) analyzed Hungarian speakers in three age groups: 9-year-old children, young adults in their 20s, and older speakers between 75 and 90 years. Age group influenced F1 and F2 values: F1 values decreased with age but F2 values decreased from childhood to young adulthood, then increased again for the older adults.
Results for fundamental frequency in FPs differ across studies. As noted above, Candea et al. (2005) observed similar values of f0 across languages. Karpiński's (2013) data showed a limited f0 range. While Duez (2001) found speaker specific and relatively stable f0 values independent of the position in an utterance for French, Shriberg and Lickley (1993) found that f0 values for clause-internal fillers in English were dependent on the preceding f0 peak, i.e., they were sensitive to prosodic context. Morel and Danon-Boileau (1998, p. 82) noted that f0 of euh mainly corresponded to the "unmarked" level, i.e., it lay within the lower f0 range of the speaker. However, the comparability is limited, as varying observations for f0 might arise from typological differences of the analyzed languages, and from differences in the annotation scheme with the respective exclusion or inclusion criteria accounting for prosodic position. For example, some fillers occurring at the end of an utterance, just before a pause, may not be considered in f0 analyses, because they are realized with creaky voice or phonation may be completely absent (e.g., Belz 2021; Karpiński 2013).

Aging Effects on Speech Anatomy, Physiology, and Acoustics
The literature on vocal aging is likewise extensive and diverse in terms of speech tasks and age groups assessed. Until rather recently, it was heavily dominated by studies of English. To constrain the review, we will exclude studies that only included men and did not assess aging beyond age 50. Additionally, we did not systematically include studies of pathological aging; that is, the papers we cover focused on speakers without severe cognitive or physiological impairments.
We also note that we did not seek, in this study, to provide an exhaustive set of measures that might vary with age, but were particularly interested in laryngeal measures (viz., voice quality and f0). One frequently-discussed aspect of aging that we omit here is slowed speech or articulation rate. Readers are referred to the comprehensive studies of Schötz (2006) and Fougeron et al. (2021) for results on duration and other measures not included here.

Respiratory System
The respiratory system is subject to calcification of cartilages, an increase in stiffness, and a reduction in compliance with aging (Estenne et al. 1985;Segre 1971). On average, older persons may need to increase inspiratory and/or expiratory volumes compared to younger ones when speaking (Sperry and Klich 1992) to compensate for reduced vital capacities and increased residual volumes (Frank et al. 1957;Hoit and Hixon 1987). Older adults have been found to use a greater percentage of vital capacity per syllable and produce fewer syllables per breath group (Hoit and Hixon 1987; see also Gerstenberg et al. 2018). In a large-scale study employing the same corpus of spoken French used here, Gerstenberg (2011) observed that interpausal units (measured in seconds) decreased with age, especially in the female speakers.

Upper Vocal Tract
Anatomical data suggest that the supralaryngeal cavities may enlarge with aging. Skull dimensions increase into advanced age (Israel 1973;Lasker 1953), and the vocal tract may be lengthened by lowering the larynx (Segre 1971). Interestingly, Xue and Hao (2003) documented a chronological increase in length and volume of oral (but not pharyngeal) cavity length and volume for males and females, in contrast to earlier pilot data suggesting larger pharyngeal cavities in females as a function of age (Xue et al. 1999).
Increases in vocal tract size should lead to lowered formant frequencies, and several studies have reported a lowered F1 with advancing age (e.g., Albuquerque et al. 2020;Endres et al. 1971;Linville and Fisher 1985;Linville and Rens 2001;Reubold et al. 2010;Scukanec et al. 1991). One could expect the effects of larger vocal tract dimensions to extend to other formants as well, but the data here are more limited. Some reports exist for F2, but few studies report on F3, F4, and higher formant frequencies. The long-term average spectrum results of Linville and Rens (2001) showed lowered values of F1, F2, and F3, but the percentage decrease was largest for F1, and for men the results were only significant for F1. Linville and Rens attributed more extreme effects on F1 to a localized increase in posterior regions of the vocal tract and speculated that larger effects in females might reflect osteoporosis, more extreme thinning of inter-vertebral disks, and/or a greater susceptibility to weakened muscular support of the larynx. It should be noted, however, that formant changes with aging can reflect other factors. Reubold et al. (2010) interpreted F1 changes over time as an adaptation to f0 lowering with the purpose of maintaining f0-F1 relationships. Further, formant changes with age are not always consistent across vowels (see summary in Eichhorn et al. 2018). Some aspects of chronological change, especially across speakers, could reflect sound change (e.g., Reubold et al. 2010) and/or stylistic factors that may change with age and interact with gender. As one possible example, Fougeron et al. (2021) observed that measures of F1 and F2 ranges differed in older males and females: Males showed no significant changes in F1 range with age, whereas females did; conversely, F2 ranges changed with age in males but not females.

Larynx
Chronological aging effects on the larynx have been studied extensively. The thyroid cartilage and the articular surfaces of the arytenoids demonstrate calcification, ossification, and changes in the balance and organization of collagen, resulting in lower compliance (e.g., Hirano et al. 1983;Kahane 1987bKahane , 1988Segre 1971;Turk and Hogg 1993). The vocal folds themselves undergo a decrease in innervation and blood supply and changes in the quantity of muscular, collagenous, and elastic fibers (Hammond et al. 1998;Hammond et al. 2000;Michel et al. 1987;Segre 1971). Such changes may negatively impact control of vocal fold position and tension (Kahane 1987a;Paulsen and Tillmann 1998). Such effects are, on average, less extreme in females than males.
Laryngoscopic observations of the vocal folds in healthy older females have documented bowing, glottal gaps, vocal fold atrophy, edema, asymmetry, aperiodicity, stiffness, and reduced amplitude of vibration (Biever and Bless 1989;Honjo and Isshiki 1980;Lundy et al. 1998;Pontes et al. 2005). Some degree of incomplete glottal closure is common in females throughout adulthood (e.g., Linville 1992), but the position of the gap along the vocal folds may change (Biever and Bless 1989;Linville 1992; cf. also Yamauchi et al. 2014). In combination, these laryngeal changes could plausibly affect the fundamental frequency (f0), its stability, and measures of voice quality. Reduced laryngeal efficiency owing to vocal fold bowing may also lead speakers to adjust their utterances and/or laryngeal settings in light of aerodynamic requirements for speech.
Acoustic studies of females mostly report f0 lowering at advanced ages (Benjamin 1981;Brown et al. 1989;Dehquan et al. 2012;Ferrand 2002;Higgins and Saxman 1991;Honjo and Isshiki 1980;Reubold et al. 2010;Russell et al. 1995;Stoicheff 1981;Xue and Deliyski 2001). When one looks at females post-menopause, the results are somewhat more complex. Schötz (2006), assessing six words, found that f0 decreased until about age 50, followed by a slight increase to age 70, followed by a slight decrease. This database included 5 or 6 speakers per year for speakers in their 70's, but results for ages 80 and higher were based on 12 speakers. Stathopoulos et al. (2011) measured f0 in sustained /a/ for 6 speakers per decade up to age 50, after that 7-11 speakers per decade in their 60's, 70's, and 80's, and 4 speakers in their 90's. They similarly observed a decrease in f0 (to about age 60), followed by a slight increase. Finally, Fougeron et al. (2021) measured f0 in a sentence for 265 women, with fuller representation (37 or more speakers) per decade age 60 or older. They observed that f0 decreased to about age 40 and then remained stable. It is difficult to disentangle the effects of sampling and speech materials here. Still, it does appear that age-related f0 decreases for females reported in past work mainly reflect younger to middle-aged speakers, and that f0 in older females may be stable or show a slight increase.
Within-speaker f0 standard deviations have been found to increase with age (Brown et al. 1989;Linville and Fisher 1985;Stoicheff 1981;Xue and Deliyski 2001;cf. also Fougeron et al. 2021, who reported on the coefficient of variation of f0). For voice quality, most studies have focused on measures that correspond perceptually to a hoarse or harsh quality, viz. jitter and shimmer. Whereas shimmer seems to increase with age (Biever and Bless 1989;Dehquan et al. 2012;Fougeron et al. 2021;Xue and Deliyski 2001), results for jitter are conflicting: Some authors report higher aperiodicity in older females (Dehquan et al. 2012;Fougeron et al. 2021; Xue and Deliyski 2001) but many find no age effects (Biever and Bless 1989;Brown et al. 1989;Ferrand 2002;Linville and Fisher 1985;Schötz 2006). Other measures of aperiodicity include harmonics (or signal) to noise ratio (HNR) and cepstral peak prominence (CPP). Schultz et al. (2021) recently reported that the standard deviation of CPP contributed to the prediction of chronological age in males and females age 50-92 and noted that CPP might correlate with the use of creaky voice. Fougeron et al. (2021) observed that average CPP values rose in females to about age 40, and then fell slightly after age 60. Results on HNR are sparse and contradictory. In a lifespan study (age 4-93 years), Stathopoulos et al. (2011) observed that HNR rose in American English-speaking females to age 50 and fell after that. However, Fougeron et al. (2021), studying French speakers between 20-93 years, found that HNR fell to about age 50 and then remained stable. Both of these studies drew their measures from sustained productions of /a/, so the differing findings suggest that linguistic, demographic, or cultural factors may be important variables.
Measures of voice spectral tilt are rare in studies of vocal aging (cf. Schötz 2006), and the data do not lead to clear predictions. Xue and Deliyski (2001) observed that older females had relatively more energy in lower (70-1600 Hz) frequency ranges than higher ones (1600-4500 Hz) compared to younger ones. In contrast, Decoster and Debruyne (1997) found a greater balance of energy at higher frequencies in older females compared to younger ones in measures comparing H1-H2 (the amplitudes of the first and second harmonics) and E1-45 (mean energy up to 1 kHz and between 4-5 kHz). Karlsson and Hartelius (2021) recently observed complex patterns of age-related change in multiple Mel Frequency Cepstral Coefficients and their variability for sustained /a/: Relative energy increased in some narrow frequency bands, decreased in others, and showed U-shaped patterns in still others. Patterns also differed for men and women. Although considerably more work is needed to understand how spectral balance may vary with age, it does appear that aspects of spectral tilt could contribute to listener perceptions of speaker age (Schultz et al. 2021).
In sum, as with formants, some acoustic measures of laryngeal characteristics have potential explanations in terms of anatomical and physiological changes. However, linguistic factors, including prosody, should also be considered.

Research Questions and Predictions
The results so far do not provide a uniform picture of age-related changes in the use of filler particles. Based on Gerstenberg (2015), we did not expect increased use of euh. That study assessed 29 LangAge speakers recorded seven years apart. The speakers of the present study were included in that corpus.
Work focusing on physical effects of aging leads to the general prediction of lower f0 in older female speakers. However, results on f0 in fillers, across languages and age, are mixed, possibly because authors have not analyzed the data based on prosodic context. Generally, we might predict that utterance-final positions have lower f0 as a function of declination (e.g., Gendrot and Schmid 2011) and utterance-final use of creaky phonation (Aare et al. 2018;Ogden 2001). Zhang (2016b) has also suggested that speakers may make glottal adjustments to conserve airflow reserves under some conditions (e.g., long utterances), such as adopting more adducted laryngeal postures. It is also possible that age interacts with prosodic patterns so that, e.g., use of utterance-final low f0 and creaky voice varies with age as a speaker-chosen stylistic pattern.
Data on the formant characteristics of fillers in French speakers are sparse. Based on previous acoustic studies of aging, we expected to find a decrease in F1 with age, but no effects of position on formants. We included F2 among our measures for completeness, but the literature did not lead to clear predictions here. We did not undertake an assessment of F3, F4 and higher formants because (a) they are sparsely represented in past work and (b) higher-frequency formants can often not be reliably detected, especially in quiet or breathy speech.
For voice quality, the documented laryngeal changes would suggest greater noise, reflecting breathier or less periodic phonation, in older speakers. Breathiness could also correspond with greater spectral tilt. Utterance-final use of creaky voice could lead to greater aperiodicity and a flatter spectrum (Childers and Wu 1991) as a function of position.
As noted above, much of the emphasis on aging in females focused on menopausal changes. Limited data for post-menopausal females suggest that aspects of the voice may continue to change, but such changes may be relatively modest (for review see Schötz 2006, chp. 4). This observation tempers our expectations for the magnitude of aging effects.
In sum, the predictions were as follows: 1.
No increase in euh usage with age.
Lower F1 with age, but no effect of position.

4.
Greater noise and steeper spectral tilt with aging; greater aperiodicity and flatter spectral tilt in utterance-final position.

Database
We drew on the LangAge corpus (Gerstenberg 2005(Gerstenberg -2021, which collected biographical narrative interviews with French speakers living around Orléans in the central part of France where the standard French variety dominates. The first interview series started in 2005, with various ways of approaching participants (in the street, friend of a friend, retirement homes, writing classes) so that different social milieus were included. The interviewer (third author of the paper) used open-ended questions regarding participants' memories of World War II. This protocol provided an engaging and interactive context for eliciting spontaneous speech. Participants were re-contacted in subsequent years to see if they were willing to be interviewed again. Follow-up meetings were conducted by the same interviewer revisiting the same themes.

Speakers
The current analysis includes 10 females interviewed in both 2005 and 2015/2016. Speaker ages were 63-86 years in the first session, and 74-96 years in the second. We restricted our analysis to females, given the evidence cited above that aging may have differential effects on male vs. female voices. In all cases, participants were judged to be normally-aging based on responsivity and appropriate discourse pragmatics (Gerstenberg 2011). In all cases but one (participant 037), the speakers were living independently at the Languages 2021, 6, 211 8 of 24 time of both the first and second recording sessions (015 had moved to a retirement home in between).

Instrumentation
The initial interviews from 2005 were recorded using a SONY Minidisk (MZ-R91) recorder and a Philips SBC ME570 condenser microphone. The later ones from 2015 used an Olympus Linear PCM recorder (LS-5) with the same Philips SBC ME570 microphone or the microphone built into the Olympus recorder. Recordings were carried out in the participants' homes (or apartment within the retirement home, in the case of speaker 037). The resulting recordings were of good acoustic quality but had the natural variations that arise in sociolinguistic fieldwork.

Utterances Analyzed
In initial assessments (Koenig et al. 2020), we used orthographic transcriptions to find frequent discourse particles, including euh, usually realized as [ø] or [ae], and alors. In the current study, we restricted the analysis to euh. A tier was added to the textgrids in Praat (retrieved 22 July 2020 from www.praat.org; v. 6.0.36), in which we manually marked the phonetic boundaries of euh with stable f0 and visible formants. Tokens were coded as being initial (ini), internal (int) or at the end (end) of an interpausal unit (IPU). We consider utterances to be demarcated by pause, so that in our usage, IPU = utterance. Criteria were as follows: Initial had a silent pause of 250 ms or longer just before euh; there was no pause when euh occurred in internal position; and a pause of minimally 250 ms followed euh when it occurred at the end. (When euh was preceded or followed by short silent intervals <250 ms, it was labeled as int). When euh was separated by two pauses and was an IPU on its own, the position was labelled as being between pauses (btwPaus). In some cases, euh was followed by a long portion of creaky voice, but no silent pause. While creaky voice may reflect a form of filled pause, we decided to label those productions of euh as internal, because creaky voice could not clearly be separated from the discourse particle. The 250 ms threshold was set arbitrarily, but it is similar to values used in studies of fluency (cf. de Jong and Bosker 2013), i.e., in which pause is assessed in discourse contexts. Tokens were omitted at this stage based on (a) overtalk or interfering background noise and/or (b) lack of phonation. The reason for the latter is that the analysis (see next section) requires detection of glottal cycles, i.e., f0. Examples of the four contexts are provided in Appendix A, Figure A1.
Note that we did not attempt to assess age-related changes in the duration of euh. In many cases, euh was produced as part of a continuous speech stream; when euh was contiguous with a vocalic element, clear acoustic boundaries were not evident. Further, in utterance-final positions euh productions sometimes had long breathy or creaky offsets. In those cases, offsets could be difficult to identify against background noise. Moreover, formant structure could continue in the absence of phonation. As stated above, our labeling procedure marked off the region of phonated euh productions that showed formant structure for F1 and F2 in the spectrogram. With these labeling criteria, it was not clear that a durational comparison across ages would be valid.

Processing
The first step was to extract the vowels in Praat using the manually delimited textgrid boundaries. Preliminary analyses indicated that very short vowels frequently did not yield measures of f0 or formants (no data, or highly unstable); thus, only tokens longer than 100 ms were extracted. These were subsequently processed using VoiceSauce [www.phonetics.ucla.edu/voicesauce; v. 1.34, last accessed 14 December 2021] with a sampling rate of 16 kHz. Initial parameter settings used Praat to estimate f0 (range 40-600 Hz, smoothing bandwidth = 10 ms) and formants (N = 4 formants, maximum formant frequency = 5500 Hz). The resulting output was reviewed to determine if a large number of productions had major discontinuities in f0 (e.g., halving or doubling) or F1. In such cases, parameters were adjusted on a speaker-and token-specific basis and VoiceSauce was re-run. These final VoiceSauce files were reviewed with reference to the waveforms and spectrograms, and occasional spurious values of f0 or F1 were trimmed out in MATLAB (2018). Final exclusions were made in cases where (a) VoiceSauce failed to complete the analysis, (b) the resulting values of f0 or F1 were greatly at odds with the spectrogram or the percept, and/or (c) a region of stable f0 for most of the token could not be determined based on the harmonic structure in a narrowband spectrogram. The complete dataset with 10 speakers, 2 timeframes, and 4 prosodic positions consisted of 1452 samples.

Measures
Of the numerous output measures provided by VoiceSauce, Table 1 depicts the ones chosen for statistical analysis, focusing specifically on voice quality parameters, because they have been the least investigated. VoiceSauce provides four measures of harmonics to noise ratio, with overlapping frequency ranges (0-500, 0-1500, 0-2500, and 0-3500 Hz). Before settling on the final dependent measures, we obtained a correlation matrix for all variables. This analysis indicated, not surprisingly, that the four HNR measures were highly correlated, with r-values 0.85-0.99. The lowest of these r-values was between the most distant ranges, i.e., HNR05 and HNR35. For final analysis, we chose HNR35 alone. Other correlations greater than r = 0.70 were among the input measures A1, A2, A3, H1, H2, H2k, i.e., among the amplitudes of the first three formants (A1-A3) and harmonic amplitudes H1, H2, and that closest to 2000 Hz. None of the spectral tilt measures themselves showed such high intercorrelations.
All spectral tilt measures were corrected to account for differences in vowel quality (Hanson 1997;Iseli et al. 2007). All VoiceSauce outputs measure over time (here, every 10 ms). For this analysis, we did not explore variation over time but entered average values for each measure and token into the statistical analyses.

Statistics and Exclusions
All statistical analyses were carried out using R (2021), version 4.1.0. Several linear mixed-effects models were run using the packages lme4 (Bates et al. 2015) and lmerTest (Kuznetsova et al. 2017). The emmeans package (Lenth 2021) was used for posthoc analyses, with a confidence level of 0.95 (alpha level = 0.05), an adjustment of p-values using the Tukey method, and the Kenward Roger method to calculate degrees of freedom.
Separate linear mixed-effects models were run for all measures described in Table 1, which were treated as dependent variables. We took TIME (2005 vs. 2015, reference level 2005) and POSITION (ini vs. int vs. end, reference level ini) as independent factors and allowed for their interaction. Since euh occurring between pauses was rare or nonexistent for some speakers, it was excluded from the analysis. Speaker specific slopes and intercepts for TIME by speaker and POSITION by speaker served as random effects. When post-hoc analyses were called for, we did not assess comparisons involving different POSITIONS across TIME (e.g., comparing utterance-initial position in 2005 with utterance-final position in 2015) because we did not consider them as meaningful.
The composition of the final dataset, after the exclusion of btwPaus tokens, is provided in Table 2, showing the quantity of data for each speaker and time. A more detailed breakdown across speakers is provided in Appendix A, Table A1.

Results
We begin with reporting the results for euh frequency, and then proceed to the acoustics (formants, f0, and voice quality). A thematic summary of the acoustic results is provided in Table 3 below. The violin plots for f0, formants, and voice quality (Sections 3.2-3.4) show group data; speaker-specific data are displayed in the Appendix A ( Figures A1-A8).

Frequency of euh
We did not find evidence for increased usage of euh in the older participants. Instead, the tendency in our corpus was a decrease of normalized euh frequency: Median euh usage was 30 occurrences per 1000 tokens in 2005 vs. 25/1000 in 2015. There was a higher standard deviation in 2015 (10 euh per 1000 words in 2005 vs. 17 in 2015). Since the frequency of euh per 1000 tokens was normally distributed, we performed a paired t-test and additionally, due to the small sample size, a Wilcoxon rank sum exact test. With t = 0.24, df = 9, p = 0.81 (t-test) and W = 60 and p = 0.48 (Wilcoxon), both tests revealed no significant difference in the frequency of euh between the two time points.

Formants
Results for the two formants revealed only significant effects for F2 ( Figure 1A; see also Appendix A Figure A2

Voice Quality Parameters
The low frequency spectral tilt parameters H1H2 and H2H4 were not affected by TIME or POSITION, nor their interaction. For H1A1 (Figure 2A and Appendix A Figure  A4 The higher-frequency tilt measures, H1A2 and H1A3, did not differ according to TIME or POSITION, or their interaction. The parameter H42K ( Figure 2B and Appendix A Figure A5

Fundamental Frequency
Fundamental frequency did not show a main effect of TIME, but an effect of POSITION was found ( Figure 1B and Appendix A Figure A3). F0 was lower when euh occurred at the end of an interpausal unit in comparison to the initial position (intercept = 145 Hz, estimate = −11.31 Hz, SE = 2.98, t = −3.78, p < 0.001). The posthoc analysis revealed an interaction between the two factors. At the earlier recordings, euh at the end of an IPU had a lower F0 than in internal (2005 int vs. 2005 end, estimate = 11.39 Hz, SE = 2.73, t = 4.18, p = 0.0039) and initial position (2005 ini vs. 2005 end, estimate: 11.31 Hz, SE = 3.06, t = 3.67, p = 0.0105). In this case, 10 years later no difference among prosodic positions was found.

Voice Quality Parameters
The low frequency spectral tilt parameters H1H2 and H2H4 were not affected by TIME or POSITION, nor their interaction. For H1A1 (Figure 2A and Appendix A Figure A4 HNR35 ( Figure 3A and Appendix A Figure A7) was the only parameter with an effect of TIME only, with no interaction with POSITION (intercept = 37.85, estimate = 5.95, SE = 2.29, t = 2.6, p = 0.0242). HNR values were higher in the later recording sessions. The higher-frequency tilt measures, H1A2 and H1A3, did not differ according to TIME or POSITION, or their interaction. The parameter H42K ( Figure 2B and Appendix A Figure A5) showed an effect for POSITION, but not TIME: Values were lower for the internal position in comparison to initial (intercept: 7.96, estimate: −1.13, SE = 0.53, t = −2.14, p = 0.0447). In the posthoc analysis, an effect emerged only for 2015 between the initial and final position (2015 ini vs. 2015 end, estimate: 1.6, SE = 0.50, t = 3.17, p = 0.0433). The 2015 comparison of initial versus internal almost reached significance (p = 0.06).
HNR35 ( Figure 3A and Appendix A Figure A7) was the only parameter with an effect of TIME only, with no interaction with POSITION (intercept = 37.85, estimate = 5.95, SE = 2.29, t = 2.6, p = 0.0242). HNR values were higher in the later recording sessions.   Table 3 summarizes the acoustical results. The most robust main effects based on the p-values were found for positional effects on f0 (p < 0.001) and CPP (p = 0.00406).

Discussion
The results of this work can be summarized as follows: (a) euh frequency did not increase with age, and (b) effects of utterance position were more common than effects of aging on acoustic parameters. Aging effects were found only for measures of voice quality.  Table 3 summarizes the acoustical results. The most robust main effects based on the p-values were found for positional effects on f0 (p < 0.001) and CPP (p = 0.00406).

Discussion
The results of this work can be summarized as follows: (a) euh frequency did not increase with age, and (b) effects of utterance position were more common than effects of aging on acoustic parameters. Aging effects were found only for measures of voice quality.

Frequency of euh
As expected, based on prior work using the LangAge corpus (Gerstenberg 2015), we did not detect an age-related increase in euh usage. This is comparable to other work recording either no change (Keszler and Bóna 2019;Searl et al. 2002) or even a decrease (Gall 2019;Gósy and Silber-Varod 2021;Maxim et al. 1994;Taschenberger et al. 2019) in the use of some FPs with age. Given that FPs have multiple discourse functions, and behave differently within speakers (Bortfeld et al. 2001;Horton et al. 2010), we suggest that it is Languages 2021, 6, 211 14 of 24 inappropriate to make blanket predictions about how discourse particles, as a group, may change as a function of age. Based on our results for older speakers, and the participants of Searl et al. (2002), who were 70 years and older, the usage patterns of some filler particles could be relatively stable at later ages.

Age
Among all measures, the only two with main effects of TIME reflected voice quality: Low-mid frequency spectral tilt (H1A1) and noise in frequencies up to 3500 Hz (HNR35). For H1A1, the interpretation is complicated by interactions with POSITION (see next section). This left only HNR35 as the one clear acoustic age marker in this dataset. The direction of the difference was unexpected, however: The ratio of harmonic energy relative to noise increased with age, i.e., there was relatively less noise. This is inconsistent with reports of bowing, glottal gaps, and other forms of laryngeal insufficiency at advanced ages, giving rise to breathier voice qualities. One possible explanation is that edema in older females could increase vocal fold contact (cf. Biever and Bless 1989;Honjo and Isshiki 1980;Pontes et al. 2005;cf. also Higgins and Saxman 1991;Kahane 1987b) leading to reduced noise (Zhang 2016a).
It could also be that older speakers use more compressed glottal setting to compensate for a reduced respiratory drive (cf. Zhang 2016b). Although positional effects did not reach significance, qualitatively HNR35 was lowest in utterance-initial position, when lung volumes, on average, should be higher. Finally, speakers could adopt different voice qualities as a stylistic feature as they age. A traditional finding in the literature is that anatomical differences should, on average, lead to breathier voice qualities in females than males (e.g., Hanson 1997); it is also clear, however, that sociocultural influences can override such tendencies (e.g., Wagner and Braun 2003;Wolk et al. 2012;Yuasa 2010).
The lack of significant age effects for f0 and formants was surprising. As described in the Introduction, changes in these parameters have been observed in several studies of aging. However, the analyses of Schötz (2006) for speakers 20-90 years of age showed that f0 and F1 in females decrease from young adulthood to middle age, with more stability later on (cf. also Fougeron et al. 2021;Stathopoulos et al. 2011). That is, lifespan studies of f0 in females may arrive at different results depending on which decades are studied. Finally, as discussed earlier (Section 1.1.3), results obtained for FPs may also differ from those obtained for other kinds of speech materials.

Utterance Position: Main Effects and Interactions
The effect of POSITION on f0 was as expected, with lower values in the final compared to the initial position. This decrease could reflect reduced respiratory supply, declination, and/or greater use of creaky voice utterance-finally. On the other hand, formant frequency changes as a function of position were not expected. The greater centralization (lower F2) of euh in later positions could reflect a form of supralaryngeal declination (e.g., Tabain 2003;Vayra and Fowler 1992). In general, centralization may be more typical in connected speech than in other speech tasks such as reading.
For the voice quality measures, Cepstral Peak Prominence (CPP) revealed a positional effect that, in post-hoc tests, emerged only for the 2005 recording sessions. This could indicate (cf. Schultz et al. 2021) that utterance-final creak was more extensive or consistent in earlier recording sessions. This would accord with the patterns found for f0.
Three spectral tilt measures had either main effects of POSITION (H42K, H2KH5K) or had an interaction with POSITION (H1A1). In all cases, significant effects emerged in the 2015 recording sessions only. In a post-hoc assessment, we explored the input parameters to these various tilt measures to understand these effects. In the case of H1A1, the effect mainly arose from changes in the amplitude of F1, i.e., H1 amplitude did not change much. For H42K and H2KH5K, inspection of the inputs showed in both cases that lower amplitude at 2 kHz changed the ratio. Thus, in 2015, frequencies in the range of F1 and at 2 kHz showed reduced amplitudes in later utterance positions (i.e., int and end compared to init). Interestingly, summing over POSITION, the H1A1 age difference mainly reflected changes in H1 rather than A1.
The finding of more voice quality changes as a function of POSITION in 2015 could indicate more laryngeal adjustments, possibly as a mechanism of airflow conservation, at the later ages. More generally, these interactions between POSITION and TIME suggest that prosodic functions may mediate the effects of physical aging (cf. Cole 2015 on the observation that prosody interacts with multiple other aspects of speech and language). Lifespan studies of speech characteristics accordingly should recognize possible interplays among physical, linguistic, and pragmatic effects during aging (cf. Gerstenberg 2020).

Limitations and Future Work
Although our study contributes to the relatively small body of longitudinal studies of aging, our sample size was not large. Further work is needed to determine to what extent these results can be generalized, both within the demographic studied here, and to different groups of older speakers across cultures.
The LangAge project, similar to all studies employing volunteer participants, probably tended to draw the interest of people who were at least as healthy as average for their cohort (cf. Eichhorn et al. 2018;Michel et al. 1987). Some data suggest that effects of 'aging' may be considerably more pronounced in persons in poorer physical condition (Ramig and Ringel 1983). Further, our results on French may not hold for speakers of other languages. It is not clear to what extent the nature of vocal aging may vary with social and cultural factors (cf. Michel et al. 1987;Ringel and Chodzko-Zajko 1987).
The extensive variation across prosodic positions and speakers may have masked some possible differences. For example, although CPP did not show a significant age effect, qualitatively (see Figure 3B) values were higher in the later recording sessions, in parallel with HNR035 ( Figure 3A), but the CPP data were quite variable, especially in 2015. Indeed, most of our statistical effects were weak.
Finally, although we included speaker-specific slopes in all analyses, we have not extensively explored speaker differences, which could reveal subtypes of aging. Considerable inter-individual variation has been observed in anatomical and physiological aging (Hammond et al. 2000;Hirano et al. 1983;Kahane 1988;Pressman and Kelemen 1955;Turk and Hogg 1993). The same presumably holds for other aspects of aging.

Conclusions
Our results for the frequency of the filler euh in spoken French are not consistent with the notion that hesitation phenomena are generally increased in older speakers. They do point to the need to assess FPs in terms of prosodic and pragmatic contexts.
Formants, f0, and aperiodicity have been widely reported in the literature on vocal aging, with less attention given to spectral measures of voice quality. Our results suggest that such measures may provide a rich source of information on vocal changes with aging. Given the wide range of voice qualities in female speakers reported by Hanson (1997), and the increasing cross-speaker variability that arises with aging , speaker specific analyses may be needed to draw clear conclusions from such data.
For understandable reasons, past acoustic studies of vocal aging tended to use restricted speech materials (sustained vowels and/or reading). Results from such controlled and somewhat atypical speech tasks may not carry over well to spontaneous speech. The complex patterns observed in our data, with numerous interactions between TIME and POSITION, suggest that a full lifespan perspective on speech characteristics should consider interactions among physical, sociolinguistic, pragmatic, and cultural factors. We acknowledge the statistical challenges of such work. Case studies and/or large corpus analyses as well as combining the factors in principal component analyses or the like could contribute to a better understanding of the multidimensional nature of how human aging is reflected in the speech signal. Acknowledgments: A preliminary analysis was presented at the 179th meeting of the Acoustical Society of America ("Acoustics Virtually Everywhere"), December 2020. We express our thanks to the LangAge participants and the whole team behind LangAge corpora, Moriah Rastegar, supported by a graduate student assistantship at Adelphi University, and Eman El Sherbiny Ismail at University of Potsdam for help in the editing process.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Appendix A corpus analyses as well as combining the factors in principal component analyses o like could contribute to a better understanding of the multidimensional nature of human aging is reflected in the speech signal.
Funding: For the interview series of 2015, an innovation grant was awarded by Freie Unive Berlin. Funding for Open Access was provided by the Leibniz Open Access Publishing Fund ( and Leibniz Zentrum für Allegemeine Sprachwissenschaft (80%).

Institutional Review Board Statement:
The project does not involve vulnerable individuals i sense of the Helsinki declaration.

Informed Consent Statement:
The subjects involved in the study signed a written consen anonymized used of transcript and audio data.

Data Availability Statement:
Interviews of the first series are available under Creative Com Attribution-NonCommercial-ShareAlike 4.0 International License (with one exception wher consent does not cover internet access).
Acknowledgments: A preliminary analysis was presented at the 179th meeting of the Acou Society of America ("Acoustics Virtually Everywhere"), December 2020. We express our than the LangAge participants and the whole team behind LangAge corpora, Moriah Rast supported by a graduate student assistantship at Adelphi University, and Eman El Sherbiny I at University of Potsdam for help in the editing process.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role i design of the study; in the collection, analyses, or interpretation of data; in the writing o manuscript, or in the decision to publish the results.

Appendix A
Languages 2021, 6, x FOR PEER REVIEW 18               Languages 2021, 6, x FOR PEER REVIEW Figure A8. Speaker-specific data for CPPm. Table A1. The data on total euh quantity per speaker (columns 3-5) were drawn from the LangAge corpus. Columns 6 show the data analyzed here, including the breakdown across positional contexts. Since the between-Pause context w small and not balanced across speakers, the number of analyzed euh includes the other three contexts (ini, init, end) on The data show extensive speaker differences in euh usage which can be explored in future work.  Figure A8. Speaker-specific data for CPPm. Table A1. The data on total euh quantity per speaker (columns 3-5) were drawn from the LangAge corpus. Columns 6-11 show the data analyzed here, including the breakdown across positional contexts. Since the between-Pause context was small and not balanced across speakers, the number of analyzed euh includes the other three contexts (ini, init, end) only. The data show extensive speaker differences in euh usage which can be explored in future work.