Phonation Variation as a Function of Checked Syllables and Prosodic Boundaries

: The phonation variation in Shanghainese is influenced by both phonemic phonation con‑ trast and global prosodic context. This study investigated the phonetic realization of checked and unchecked syllables at four different prosodic positions (sandhi‑medial, sandhi‑final, phrase‑final, and IP‑final). By analyzing both acoustic and articulatory voice measures, we achieved a better un‑ derstanding of the nature of checkedness contrast and prosodic boundaries: (1) Different phonetic correlates are associated with the two laryngeal functions: The checkedness contrast is mostly dis‑ tinguished by the relative degree of glottal constriction, but the prosodic boundaries are mostly as‑ sociated with periodicity and noise measures. (2) The checkedness contrast is well maintained in all prosodic contexts, suggesting that the controls for the local checkedness contrast are rather indepen‑ dent of global prosody.


Language under Study: Shanghainese
Shanghainese is a variety of Chinese Wu spoken in the urban area of the city of Shanghai. As laid out in Table 1, the five-tone system of this language (Chen and Gussenhoven 2015;Xu et al. 1988) can be sorted into two contrastive dimensions. On the one hand, compared to the tones in the upper register (i.e., T1, T2, and T4), the tones in the lower register (i.e., T3 and T5) are produced with a lower f0 range and a breathier phonation (Ren and Mattingly 1989;Tian and Kuang 2019). On the other hand, there is a contrast between checked and unchecked tones, categorizing T4 and T5 against T1, T2, and T3.  (Chao 1968) and are according to Xu et al. (1988); checked tones are marked with underscores.

Unchecked [CV] Checked [CV]
Upper-register T1 (high-falling): 53 Checked syllables are traditionally transcribed as closed syllables with a coda glottal stop in Shanghainese (Xu et al. 1988). However, the phonetic realization of checked syllables in Shanghainese has not been well understood. Based on the qualitative inspection of spectrograms from limited examples, previous studies have suggested several different phonetic realizations. For example, it is proposed that Shanghainese checked syllables at least involve a shorter duration and a coda glottal stop (Zhu et al. 2008); similar findings have been reported for Longyou Wu, a related Wu dialect to Shanghainese, where check syllables are realized as an 'abrupt phonatory offset and short rhymes' (Rose 2015). However, studies also show that the checkedness of Shanghainese checked syllables is not only reflected by a coda glottal stop but can also be realized as an irregular creak throughout the vowel portion (Shen 2010). In this paper, we use [CV] to denote the check syllables in contemporary Shanghainese. The reasons for choosing this transcription are discussed in Sections 3.1.5 and 4.1. Moreover, the traditional transcription (Xu et al. 1988) has suggested that checked syllables have a lower f0 compared to the unchecked counterpart, but no instrumental studies have been conducted to validate this claim. Therefore, in order to better understand the phonetic nature of checked syllables, it is necessary to conduct more systematic acoustic and articulatory analyses for the checkedness contrast.
Moreover, based on our observation, coda glottal stop or coda creak does not always occur in checked syllables. One potential source for this variation is prosodic contexts. However, because most of the existing studies on the voice quality of the checked syllables of Shanghainese are based on the analysis of isolated syllables, and it is unclear whether the syllable creak is driven by prosodic effects or the phonemic contrast of checkedness, little is known about how phonetic features of checked syllables are realized in connected speech (Shen 2010;Zhu et al. 2008). To tease apart the checkedness effect from the prosodic effect, it is necessary to examine the phonetic realization of checked syllables at various prosodic boundaries.
In this paper, we investigate the phonemic effect, prosodic effect, and their interaction on phonation variation by examining the checked syllables in contemporary Shanghainese in order to gain a better understanding of the nature of checkedness contrast and prosodic boundaries. In Section 1, we review the language background as well as the phonation variation associated with checked codas and different prosodic boundaries; in Section 2, we present the experimental methods and materials; in Section 3, we model the measurement results to analyze the phonemic effect and prosodic effect on various phonetic parameters; In Section 4, we discuss the phonetic nature of checked syllables in contemporary Shanghainese, the effect of prosodic effect on phonation variation, and the interaction of the phonemic effect and prosodic effect.

The Tone-Sandhi Pattern and the Prosodic Hierarchy in Shanghainese
Shanghainese has three levels of prosodic phrasing above the syllable level. The lowest level is the tone-sandhi domain level. This level is related to the 'left-dominant' tonesandhi pattern of Shanghainese, which means that the tonal pattern of the entire tonesandhi domain is determined by the leftmost syllable (Duanmu 1999;Xu et al. 1988;Yip 2002, and others). For example, as shown in (1), when the tone of the initial syllable of a disyllabic word is high-rising (34), the tonal pattern of the entire word is always high-rising (33-44), regardless of the underlying tones of the second syllables.
(1) sɤ34 + ɕin53 → sɤ33ɕin44 'palm' sɤ34 + ɕin34 → sɤ33ɕin44 'souvenir' sɤ34 + ɦin23 → sɤ33ɦin44 'hand-shape' Non-initial syllables in tone-sandhi domains are also subject to various phonetic reductions (Chen 2008;Kuang et al. 2018;Ling and Liang 2016;Tian and Kuang 2020). The sandhi domain is considered as an important prosodic domain in Shanghainese because it is associated with word formation and usually marks the prosodic word boundary but sometimes can be smaller than a word (Roberts 2020;Selkirk and Shen 1990;Yip 2002;Zee and Maddieson 1979). For example, in a trisyllabic personal name, it is common for the first two syllables to form a tone-sandhi domain while leaving the third syllable itself to form a monosyllabic tone-sandhi domain (see Section 2.1 for detailed discussion). An example of such a trisyllabic name is given in (2); the brackets indicate a tone-sandhi domain.
(2) li23 + ɕiɔ34 + min23 → (li22ɕiɔ44)min23 The tone-sandhi domain level above is the 'Major Phrase (phrase)' level, which can be identified by phrase-final lengthening and pitch reset but a lack of audible pause (Roberts 2020;Selkirk and Shen 1990). For example, trisyllabic personal names can often form a phrase domain; the entire trisyllabic name 'li22ɕiɔ44min23' in example (2) forms a phrase domain.
Finally, the highest level is the 'Intonational Phrase (IP)' domain, which is marked by a final lengthening after the boundary of an 'actual silence' (Chen 2008;Roberts 2020;Selkirk and Shen 1990). Usually, the boundary of an IP domain in Shanghainese is the same as the boundary of a sentence.
In the current study, we examine how the phonetic realization of checked syllables is affected by the three levels of prosodic phrasing. For convenience, in the following discussion, from the lowest to the highest, we will refer to the three levels of prosodic phrasing as 'sandhi domain', 'phrase domain', and 'IP domain'.
The phonatory attributes of the checked codas may vary from language to language. In some languages, the checked coda is realized as a full glottal stop, characterized by a complete closure of vocal folds and can be identified by a short period of silence during the glottal closure (Davidson 2020;Esling et al. 2005;Ladefoged 1971;Ladefoged and Maddieson 1996, and others).
However, cross-linguistically, it has been well documented that phonemic glottal stops are rarely realized as complete glottal closure, and their occurrence is conditioned by individual differences, contextual differences, and even other contingent factors (Borroff 2007;DiCanio 2012;Pan 2017;Ulrich 1993). More often, glottal stops are realized as incomplete glottal closure and can substantially influence the voice quality of the adjacent vowels and other vocalic segments (Esposito and Khan 2020;Garellek and Esposito 2021;Ladefoged and Maddieson 1996).
Due to the variation of the articulatory configuration in the larynx (and pharynx), languages vary in the extent of glottalization and types of glottalized phonation. In many languages, glottal stops are often realized as a creaky voice, or irregular glottal pulses, on the adjacent vowels. For example, in San Lucas Quiaviní Zapotec, the glottal stop involved in the checked syllables often surfaces as 'a period of strong glottalization' in the middle of the vowel (Chávez-Peón 2008). In Itunyoso Trique, the coda glottal stop is often realized as a short portion of irregular voicing at the end of the vowel (DiCanio 2012). Glottal stops in Mayan languages are also most often realized as creaky phonation on the adjacent vowels (Bennett 2016).
In addition to creaky voice, checked syllables can also be realized as other types of glottalized or laryngealized phonation such as tense voice or harsh voice. Similar to the glottalized/laryngealized voice, the tense voice is also characterized by a high glottal constriction; however, the f0 of the tense voice is neither low nor irregular (Keating et al. 2015). For instance, the phonemic tense phonation in various Yi languages (e.g., Hani and Southern Yi) is the historical reflex of vowels in syllables that have original final stops, and the tense phonation can co-occur with a mid tone (Kuang 2013;Maddieson and Ladefoged 1985); in Daigela Wa, the tense phonation and the lax phonation differ significantly in the degree of glottal constriction, but not in f0 (Wei 2018). Similar to creaky voice, tense voice is also produced with greater glottal constriction, but unlike creaky voice, tense voice is highly periodic and often high-pitched (Keating et al. 2015;Kuang and Keating 2014).
Apart from a tense voice, checked syllables can also be produced with a harsh voice in some languages (Garellek 2020;Rose 2015;Traill 1994). Harsh voice is a rough and noisy phonation that involves both strong laryngeal constriction and pharyngeal constriction (Garellek 2020;Gerratt and Kreiman 2001;Moisik 2012).
Since the phonetic realization of checked codas varies across languages and contexts (see Section 1.5), multidimensional phonetic features are needed to characterize the phonetic nature of checked syllables.

Phonetic Correlates of Checked Syllables
In this study, we are particularly interested in the phonation variation introduced by the checked coda. If Shanghainese checked syllables are realized as glottalized vowels, acoustically, we would expect the spectral slope, or the relative strength of the lowerfrequency harmonics and higher-frequency harmonics, can reliably distinguish checked syllables from unchecked syllables. Checked vowels that are produced with greater glottal constriction should be associated with a flatter spectral slope or less prominent lowerfrequency harmonics in the spectrum Garellek and Keating 2011;Gordon and Ladefoged 2001;Keating et al. 2015;Kuang and Keating 2014, among many others).
Periodicity and noise ratio are also important acoustic correlates for glottalization. Different types of laryngealized phonation have different periodicity profiles. Creaky voice is usually associated with less periodicity and lower harmonic-to-noise ratio, but tense voice is associated with higher periodicity and higher harmonic-to-noise ratio in addition to a higher pitch (Keating et al. 2015). Moreover, energy damping or energy dips are also observed in checked syllables in some languages (DiCanio 2012;Pan 2017).
Articulatory, the characteristics of glottal closure and the degree of relative glottal constriction can be non-invasively measured by an Electroglottograph (EGG). Contact Quotient (CQ), defined as the ratio of the duration of the contact phase to the period of the vibratory cycle (Rothenberg and Mahshie 1988), is the most important measure for EGG. This measure has been found to be a reliable indicator for phonation contrast in various languages, and greater CQ is correlated with greater glottal constriction (DiCanio 2009; Esposito and Khan 2012; Garellek 2020; Guion et al. 2004;Jiang et al. 2017;Kuang and Keating 2014;Li and Zhang 2020;Mazaudon and Michaud 2008;Tian and Kuang 2019, and so on).
Another important measure of EGG is Peak in Contact (PIC), also known as Derivative-EGG Closure Peak Amplitude (DECPA) (Esposito and Khan 2012;Keating et al. 2010;Kuang and Keating 2014;Michaud 2004). It is defined as the amplitude of the positive peak in the first derivative of the EGG signal. PIC is an important feature of dEGG, as the positive peak of dEGG marks the beginning of the contacting phase (Howard 1995). This measure reflects the manner of vocal fold contact, which is also an important aspect of glottal constriction. However, the specific articulatory implication of PIC is not completely clear. A popular proposal is that PIC is related to the abruptness of contact, and therefore greater PIC is related to more abrupt contact (Keating et al. 2010). Consistent with this proposal, higher PIC values are associated with the checked syllables in Northern Kam (Jiang et al. 2017). However, the opposite direction was also reported in several languages. For example, the breathier phonations (e.g., breathy or lax voice) in Gujarati, white Hmong (Esposito and Khan 2012) and Yi languages (Kuang and Keating 2014) were found to have higher PIC values instead. Although inconsistent directions were reported among the aforementioned languages, PIC has been found to be a reliable measure for distinguishing contrastive phonation in these languages (Esposito and Khan 2012;Keating et al. 2010;Kuang and Keating 2014;Michaud 2004). Therefore, in this study, we included both CQ and PIC to assess different aspects of glottal constriction.
Moreover, since f0 affiliation is an important aspect of phonation types (e.g., vocal fry vs. tense), and phonation variation can be driven by pitch variation in tone production, as found in the case of Mandarin (Chai 2021;Kuang 2017), it is also necessary to examine f0 values of checked and unchecked vowels and their correlations with phonation cues.
Lastly, vowel duration was measured in this study as well, as it has been reported to be a useful cue for the checkedness contrast in Shanghainese (Chen and Gussenhoven 2015;Xu et al. 1988;Zee and Maddieson 1979;Zhu et al. 2008, among many others).

Phonation Variation Related to Prosodic Boundaries
Our discussion has mainly focused on the phonation variation introduced by local phonemic contrasts in the previous sections. However, phonation also co-varies with the global laryngeal functions, such as vocal effort, prosodic boundaries, prominence, and so on (Epstein 2002;Garellek 2015;Klatt and Klatt 1990;Kuang 2018, and others). In particular, a large number of studies have shown that prosodic factors (i.e., prominence and boundary) are of significant influence on the occurrence of allophonic creak and other types of laryngealization.
On the one hand, prosodic prominence has a great impact on the relative degree of glottal constriction. Prosodically weak positions, such as unstressed syllables, are often (but not always) associated with an irregular creaky voice (Epstein 2002;Klatt and Klatt 1990;Kuang 2018). By contrast, stressed or prosodically prominent syllables are often associated with greater vocal effort or a tenser voice (Bird and Garellek 2019;Epstein 2002;Garellek 2015;Mooshammer 2010).
On the other hand, the occurrence of glottalization is significantly influenced by prosodic boundaries. It is well-known that aperiodic creak is more likely to occur at the prosodic domain-final positions: Creak is generally more likely to occur at the phrase-final position but much less likely to occur at the phrase-medial position (Davidson and Erker 2014;Dilley et al. 1996;Garellek 2013;Luthern and Clopper 2015;Seyfarth and Garellek 2020). Moreover, the likelihood of creak is strongly correlated with the strength of the prosodic boundaries: The larger the boundaries, the more frequent the creak. As such, creak is most likely to occur at the end of the utterance domain (Garellek 2013;Klatt and Klatt 1990;Pierrehumbert and Talkin 1992;Redi and Shattuck-Hufnagel 2001). The effect of phrase-final creak appears to be quite universal cross-linguistically, as it has been widely reported among tonal languages as well (Esposito 2003;Garellek 2012;Kalita et al. 2017;Kuang 2018).
Taken together, the creak of checked syllables in Shanghainese can be the consequence of the interaction of two distinct sources: the local phonemic effect from the checked coda, and the global laryngeal effect from the domain-final prosodic boundary. Therefore, to tease these two sources apart and to better understand the interaction between the two laryngeal functions, it is important to investigate the phonation variation of the checkedness contrast at different levels of prosodic boundaries.
Moreover, although the effects of large prosodic boundaries seem to be quite universal cross-linguistically, it remains largely unclear whether the smaller prosodic boundaries, especially those defined by language-specific phonology, such as the tone-sandhi domain in Shanghainese, also follow the general phrase-final effects.

Interaction between Global and Local Laryngeal Functions
This study is also more generally related to the question of the interaction between global vs. local laryngeal functions. As we discussed above, phonation variation can be driven by local functions (e.g., phonemic checkedness contrast at the syllable level), as well as by global functions, such as sentence prosody and other paralinguistic factors (e.g., vocal effort, emotion).
By far, the understanding of this topic is still extremely limited. Some studies have suggested that local laryngeal functions can be relatively independent of global laryngeal functions. For example, in Irish English, an accented syllable is consistently tenser than an unaccented syllable, regardless of the voice quality the speaker uses to pronounce the whole utterance (Yanushevskaya et al. 2016); in American English, the phonetic performance of the pitch accent of words differs from that of prosodic boundaries, with the former accompanied by a higher CQ, while the latter is accompanied by a lower CQ (Bird and Garellek 2019). In White Hmong, a tonal language, the phonation differences of tones are not modulated by the prosodic position in the utterance (Garellek and Esposito 2021). Similarly, in Shaoxing Wu, a closely related language of Shanghainese, it is found that the syllables of the lower register always have breathier phonation than those of the upper register, regardless of the vocal effort .
In contrast, other studies have shown that the local laryngeal function is highly influenced by the global laryngeal function in some languages. Some studies have shown that the reinforcing effect of non-modal phonation, in terms of prosodic effects or other sources of vocal efforts, can sometimes weaken the phonetic differences between phonation contrasts. For example, in Santa Ana Del Valle Zapotec, the three-way phonation contrast is minimal when the target is isolated and has a focus or when the target is in the initial position . In Shaoxing Wu, while the phonation contrast is maintained in all vocal effect conditions, the contrast is less well-defined in the loud and soft conditions than in the normal condition . Moreover, sometimes, it seems that the global laryngeal function can overwrite the local laryngeal function. For example, in Mandarin, larger phrasal boundaries are likely to have creak regardless of tonal categories (Kuang 2018).
In addition to the question of the distinctiveness of the phonemic phonation contrast, it is also important to understand whether the creak introduced by the global laryngeal functions has the same voice source mechanism as the creak introduced by the local laryngeal functions. By far, we still have very little knowledge about this issue. In a study of German, lexical stress and loud speech are found to be produced with similar phonetic cues, except that the extent of variation is greater for vocal effort (the global function) than for word stress (the local function) (Mooshammer 2010). Similar acoustic properties for /t/-glottalization and phrase-final in English are also observed (Garellek 2015). However, in Shaoxing Wu, the phonetic correlates of vocal effort are quite distinct from that of the phonemic register contrast . Since different laryngeal functions are not always subject to the same voice source mechanism, it is necessary to examine more case studies on different languages and different types of prosodic contexts.
Furthermore, it is still unclear whether the phonetic correlates of the phonemic phonation contrast can vary according to the prosodic conditions. Limited evidence suggests that they might. For example, in San Lucas Quiaviní Zapotec, stressed checked syllables are more likely to be produced with non-modal phonation (Chávez-Peón 2008); in English, more occurrence of creak is observed for word-medial /t/-glottalization at the sentencefinal position (Pierrehumbert and Talkin 1992). Therefore, it is quite possible that the phonetic correlates of the checkedness contrast can contextually vary at different prosodic positions.
Overall, the interaction between global and local laryngeal functions is rather complicated and multifaceted. By looking into the phonation variation as the function of the interaction between prosodic boundaries and phonemic checkedness, this study will significantly advance our understanding of how prosodic structure manifests itself phonetically in tone languages.

Research Questions and Hypotheses
To summarize, this study aims to address several research questions: First of all, what is the phonetic nature of checked syllables in Shanghainese? To address this question, we examined all relevant phonetic cues for the checkedness contrast, including f0, duration, phonation and occurrence of creak. Due to the checked coda, it is likely that non-modal phonation is involved in the vowel portions of the checked syllables. To achieve a better understanding of the phonation involved in the checked syllables, both EGG and acoustic measures were collected and analyzed.
Secondly, how does phonation vary as a function of different levels of prosodic boundaries? In particular, does the tone-sandhi domain behave in the same way as other large prosodic domains (e.g., intermediate phrase and intonational phrase)? Furthermore, it is likely that both prosodic boundaries and checkedness contrast involve some sort of glottalization, but are prosodic boundaries and the checkedness contrast produced with the same laryngeal mechanisms?
Lastly, how is the local checkedness contrast influenced by the global prosodic boundaries? It is possible for both checked coda and prosodic boundaries to introduce creak at the end of a syllable. By placing checked syllables in various prosodic boundaries, we are able to tease apart these two different functions and show how they interact with each other. In particular, we test whether the same set of phonation cues is involved in the checkedness contrast and the different prosodic boundaries.

Speech Materials
In this experiment, the participants were asked to produce the minimal pair of checked syllables [za12] vs. unchecked syllables [za23] in four different prosodic positions. The checked and unchecked rhymes to be measured occur in four different prosodic positions: (3a) sandhi-medial; (3b) sandhi-final but phrase-medial (sandhi-final); (3c) phrase-final but IP-medial (phrase-final); and (3d) IP-final. The four prosodic positions are named according to Section 1.2.
As illustrated in examples (3a-3d), the target syllables (the underscored positions in the carrier sentences) were designed as part of some pseudonyms (boldfaced syllables in the carrier sentence) and were located at the different types of prosodic boundaries. The tones in the IPA transcripts in (3a-3d) are omitted. 1 [za12] and [za23] were chosen as the target syllables because they are minimally different in other phonetic aspects, such as pitch contours, vowel quality and onsets, and they are common syllables used in names. Specifically, the target syllable in (3a) is the initial syllable of a disyllabic pseudonym. Because a Shanghainese disyllabic pseudonym contains one single tone-sandhi domain, the checkedness contrast codas of target syllables occur at the sandhi-medial position (Roberts 2020;Xu et al. 1988).
The target syllable in (3b) is the second syllable of a trisyllabic pseudonym, which is a phrase in Shanghainese (Roberts 2020;Selkirk and Shen 1990;Xu et al. 1988). In Shanghainese, a trisyllabic phrase can contain either one single trisyllabic tone-sandhi domain or two tone-sandhi domains (a disyllabic domain + a monosyllabic domain). However, for personal names, unless the name has special meaning or is extremely frequently used (and normally is associated with an extremely famous person), the normal personal names use the two-tone-sandhi pattern. Because the name phrases in (3b) are pseudonyms that have no special meaning, and they are not extremely common, we expect them to be realized with two tone-sandhi domains.
We found that this was exactly the case. A diagnosis to determine whether a trisyllabic phrase is realized with one or two tone-sandhi domains is to observe the third syllable's tone-if the last syllable is realized in the same way as its citation tone, the last syllable itself forms an independent tone-sandhi domain; otherwise, if the last syllable is realized as a weak low tone, the third syllable is a part of its preceding constituent, and the whole phrase forms one single trisyllabic tone-sandhi domain. We manually checked and confirmed that all the third syllables in the pseudonyms in (3b) are realized as their citation tones in every repetition of each participant; therefore, all the trisyllabic pseudonyms in (3b) are realized with two tone-sandhi domains, and the target checkedness contrast coda occurs at the phrase-medial and sandhi-final position.
In (3c) and (3d), the target syllables are the final syllable of disyllabic pseudonyms, which are phrase domains. Moreover, the disyllabic pseudonym in (3d) is also recognized as the end of an IP, which is identified by a larger boundary pause between the target syllable and its following constituent. However, there is no such perception of 'actual silence' after the target syllable in (3c). Therefore, the target checkedness contrast coda is at the phrase-final but IP-medial position for (3c), while at the IP-final position for (3d).
In each prosodic position, each carrier sentence was repeated eleven times. To avoid potential prosodic reduction in the repetitions, the onset consonants of the following syllable were changed for each repetition. The tone and vowel quality remained the same for all repetitions. Even though we have attempted to minimize phonetic reduction, the late repetition of the target syllables still potentially has some phonetic reduction due to the similarity between the carrier sentences (Ling and Liang 2017).

Data Collection
Simultaneous audio and articulatory recordings were collected in July 2019 in a doublewall sound-attenuated booth in Shanghai, using the Komplete Audio 6 Sound Card by Field-Phon (Han et al. 2013). The audio signal was recorded with an AKG C544L Headset Microphone as the first channel, and the EGG signal was recorded with Kay 6103 as the second channel. The sampling rate of the recordings was 44.1 kHz.
Sixteen native urban Shanghainese speakers (9 female and 7 male, aged 19 to 39 at the time of recording) participated in the experiment. The entire experiment took approximately 40 min per participant, with participants taking a 5-min break every 10 min, for a total of two breaks. After completing the recording of all participants and before the data analysis started, we reviewed the obtained recorded signals. Data from two speakers (one young female and one middle-aged male) were excluded from the analysis because of the poor recording quality of the EGG signals. All participants reported having normal hearing and voice, and each participant was reimbursed with 50 RMB.

Measures
Since the efficient/reliable acoustic and articulatory indicators for non-modal phonation contrasts are indeed language-specific or even contrast-specific, we opted to use a multidimensional approach to measure phonation. In the current study, we compare both acoustic and articulatory measures of voice quality in checked and unchecked syllables from audio and EGG signals.
The vowel portions of the target syllables were manually segmented in Praat (Boersma and Weenink 2021) by the first author, who is a native speaker of Shanghainese. Tokens with failed f0 tracking were excluded from the acoustic analysis. Extensive voice measurements from both audio and EGG signals were extracted by VoiceSauce (Shue et al. 2011) and EggWorks (Tehrani 2010), using the default settings (window size = 25 ms). All measures were extracted at every millisecond. We calculated the average value of each parameter over the time interval and further performed within-speaker z-score normalization. A total of 1140 tokens were collected and annotated in the experiment; 79 vowel intervals were excluded because they are shorter than 50 ms and, therefore, could not be re-liably measured acoustically or articulatorily; 1061 tokens were involved in the subsequent analyses.
The articulatory measures from EGG include Contact Quotient (CQ) and Peak Increase in Contact (PIC). CQ was estimated with the 'CQ HT' method in EggWorks, which used the positive dEGG peak moment to define the contacting moment and takes the intersection moment of the DC contour of the EGG signal and the down sloping as the decontacting moment. The 'CQ HT' method was found to be the most accurate method to distinguish the onset phonation types in Shanghainese . PIC was measured as the amplitude of the positive peak in the first derivative of the EGG signal.

Occurrence of Creak
Most of the spectral measures rely on successful pitch tracking, which, unfortunately, is likely to fail in the segments with strong irregularity or creak. To remedy this problem, we also manually marked the occurrence of creak in the target vowels and distinguished the occurrence of different types of creak by looking at the spectrogram. The occurrence of creak in both checked and unchecked tokens was annotated by the first author.
Based on the extent and magnitude of irregular voicing, three major categories were annotated. 'Coda glottal stop' was coded when a full glottal stop was present at the syllable coda, and as illustrated in Figure 1A, the presence of a glottal stop is evident by the presence of a strong glottal pulse following a brief silent period. The second type, 'coda creak', was coded when irregular voicing was present in the last third of the vowel portion ( Figure 1B). Finally, 'broader creak' was coded when the vowel portion began to show significant irregular voicing earlier than the last third of the vowel portion ( Figure 1C).

Acoustic Measures Principal Component Analysis for Acoustic Measures
Since voice quality involves multidimensional acoustic cues, and many of the spectral or noise measures are correlated with each other, instead of exploring individual cues, we took a more integrative approach by fitting the high-dimensional acoustic measures into a Principal Component Analysis (PCA) model. The first principal component (PC1) and the second principal component (PC2) together account for over 60% of the variance (score of PC1 = 40.9%, score of PC2 = 20.8%). The multidimensional acoustic space is plotted in Figure 2. The same acoustic space in Figure 2 is color-coded twice, in order to illustrate the effects of phonemic type (Figure 2a As shown in Figure 2a,b, the phonemic types (checked vs. unchecked syllable) mostly contrast along PC1, while the prosodic positions (from small to large boundaries: sandhimedial, sandhi-final but phrase-medial, phrase-final and IP-final) of the targets mostly vary along PC2. In particular, the sandhi-medial condition is generally more distinctive from all the three sandhi-final conditions (including sandhi-final but phrase-medial, phrase-final, and IP-final). This result suggests that tone sandhi is a critical prosodic condition for the phonation variation in Shanghainese.
In order to better understand the acoustic correlates of these principal components, the factor correlation loadings for PC1 and PC2 are plotted in Figure 3. A stronger correlation between an acoustic cue and a PC is when the angle between the feature vector and the PC axis is close to 0 • (or 180 • ), and the length of the vector is relatively long. The direction of the vector indicates the direction of the correlation. As shown in Figure 3, PC1 is mostly correlated with spectral slope and the strength of individual harmonics in the higher-frequency range, and the most important cues are A2*, H1*-A2*, H1*-A1*, A3*, and H1*-A3*. More specifically, as the direction of the vectors suggests, PC1 is negatively correlated with spectral tilt measures (H1*-An*) and positively correlated with the strength of the individual harmonics (An*). Therefore, greater PC1 generally means more constricted glottis. PC2 is, on the other hand, mostly correlated to periodicity/noise measures, such as HNR15, HNR25, HNR35, HNR05, and CPP; greater PC2 is correlated with greater periodicity. Therefore, PC1 and PC2 generally represent different aspects of glottal articulation. Taking Figures 2 and 3 together, it appears that the phonemic contrast of checked vs. unchecked syllables is mostly distinguished by spectral slope measures, while the prosodic boundaries mostly differ in periodicity and noise measures.

Linear Mixed-Effect Regression Models for Acoustic Measures
To further examine the interaction between local phonemic types and global prosodic positions, linear mixed-effect regression models were fitted for each of the first two principal components in R with the 'lmerTest' package (Kuznetsova et al. 2017; R Core Team 2021). For each model, phonemic type and prosodic position and their interaction were set as the fixed factors; a random by-speaker intercept and a random by-following-consonant intercept were included as random effects. Categorical variables in these models were simple-coded; with such coding, the intercept of the model is the grand mean (Sonderegger 2020). If the interaction between phonemic type and prosodic position was insignificant in the model, we refitted the linear mixed-effect regression model without interaction effects and reported the output of the simpler model. To obtain pairwise comparisons among all prosodic positions, the same model was rerun multiple times for different reference levels. Figure 3, PC1 is predominantly negatively correlated with spectral slope measures (e.g., H1*-An*); therefore, greater PC1 indicates a more constricted glottis.

As discussed in Principal Component Analysis for Acoustic Measures and shown in
We fitted a linear mixed-effects regression model for PC1 with the parameters described above. Because there was no significant interaction between phonemic type and prosodic position, the model was refitted without the interaction effects.
As indicated by Table 2, both phonemic type and prosodic position have significant main effects on PC1. This result confirms our observation based on Figure 2. The pairwise comparisons also suggest that PC1 is significantly different between every two prosodic positions, except for sandhi-medial vs. IP-final. As illustrated in Figure 4, there is a general trend that larger prosodic boundaries have smaller PC1 values compared to smaller prosodic boundaries. Therefore, we see a trend of sandhi-final > phrase-final > IP-final. However, sandhi-medial is not part of this trend.
More importantly, there is no significant interaction between phonemic type and prosodic position, suggesting that checked vs. unchecked syllables are well-distinguished by PC1 (i.e., spectral slope measures, c.f. Figure 3) in all prosodic conditions. This is further confirmed by a post-hoc Games-Howell test (significant p-values are indicated in Figure 4).  4. The variation of PC1 influenced by phonemic type and prosodic position. Greater PC1 indicates a more constricted glottis. Significant p-values (p ⩽ 0.05) are marked in red, which indicates that the PC1 difference between checked and unchecked syllables is significant in that prosodic position.

PC2
Similar to PC1, since there was also no significant interaction effect for PC2, the linear mixed-effect regression model was refitted without the interaction effects. The outputs of the regression models for PC2 are summarized in Table 3, and the result is visualized in Figure 5.
As can be seen here, there is a significant effect of prosodic position on PC2, but the phonemic type has no statistically significant main effects. This result further validates our observation from Figure 2-PC2 is mostly about the effects of the prosodic positions. As illustrated in Figure 5, the post-hoc Games-Howell test confirms that there is no checked vs. unchecked distinction in any of the prosodic positions.
As discussed in Principal Component Analysis for Acoustic Measures and shown in Figure 3, PC2 is mostly related to the periodicity and noise ratio. Greater PC2 is correlated with greater periodicity. IP-non-final positions (both sandhi-final and phrase-final) are generally more periodic than IP-final. Again, sandhi-medial is not part of the trend and is significantly less periodic than all domain-final positions.

Articulatory Measures
Similarly, linear mixed-effect regression models were fitted for CQ and PIC, respectively. Again, phonemic type and prosodic position, as well as their interaction, were fit as the fixed factors. The by-speaker and by-following-consonant random intercepts were included as random effects. Pairwise comparisons among all prosodic positions were obtained by changing the reference levels of the model. Categorical variables were simplecoded. The modeling included all 1061 data points used in the acoustic analysis.

CQ
The outputs of the regression models for CQ are summarized in Table 4. As indicated by the table, there is a significant main effect of phonemic type. As expected, checked syllables have higher CQ values than unchecked syllables. Therefore, the phonation involved in the checked syllables has a relatively longer glottal closure duration. However, CQ is not a reliable measure of prosodic position. Essentially, only the IP-final position is significantly different from the other smaller prosodic boundaries. Moreover, there is a weak interaction between phonemic type and prosodic position. The post-hoc Games-Howell test suggests that CQ does not reliably distinguish checked vs. unchecked syllables in the sandhi-final and phrase-final positions (significant p-values are indicated in Figure 6).  6. The variation of CQ influenced by phonemic type and prosodic position. Significant pvalues (p ⩽ 0.05) are marked in red, which indicates that the CQ difference between checked and unchecked syllables is significant in that prosodic position.

PIC
The fixed effects on PIC are also analyzed in the linear mixed-effect regression model, and the output is in Table 5.
As indicated in the table, significant main effects were found for both phonemic type and prosodic position. In addition, PIC significantly differs between every two prosodic positions, except for sandhi-medial vs. IP-final. Moreover, there is generally no significant interaction between phonemic type and prosodic position, except that checked and unchecked syllables are more contrasting in phrase-medial position than in sandhi-final position. This result is similar to that of PC1. As illustrated in Figure 7, similar to PC1, PIC also exhibits a trend that larger prosodic boundaries have smaller PIC values than the smaller prosodic boundaries; but again, the sandhi-medial position is the exception of this trend. The post hoc test further confirms that PIC is a reliable measure for the checked vs. unchecked contrast across all prosodic positions.  7. The variation of PIC influenced by phonemic type and prosodic position. Significant pvalues (p ⩽ 0.05) are marked in red, which indicates that the PIC difference between checked and unchecked syllables is significant in that prosodic position.

F0
As described in Section 2.3, f0 was extracted from both checked and unchecked vowels and was also fitted into linear mixed-effect regression models. Phonemic type, prosodic position, and their interaction were set as the fixed factors of the model; we also included a by-speaker random intercept and a by-following-consonant random intercept. Table 6 shows that there is a significant main effect of phonemic type. Checked syllables have generally higher f0 than unchecked syllables. Traditionally, checked tones were transcribed with a lower pitch than the corresponding unchecked tones (Xu et al. 1988, c.f. Table 1). However, our current study indicates the opposite way-for lower-register tones, checked tones are produced with a higher f0 than unchecked tones by the younger generation of Shanghainese.
In addition, there is a significant main effect of prosodic position on f0. As can be seen in Figure 8, except for the non-final sandhi-medial position, f0 of the smaller boundaries, such as sandhi-final position, is higher than that in the larger boundaries, such as phrasefinal and IP-final positions, exhibiting a declination trend (i.e., Roberts 2020). The effect of prosodic position is similar to the patterns observed in PC1 and PIC. Moreover, there is a significant interaction effect between phonemic type and prosodic position for f0. The difference in f0 between checked and unchecked syllables is smaller in the IP-final position than in sandhi-final and phrase-final positions.  8. The variation of f0 influenced by phonemic type and prosodic position. Significant p-values (p ⩽ 0.05) are marked in red, which indicates that the f0 difference between checked and unchecked syllables is significant in that prosodic position.

Duration
The same linear mixed-effect regression model was fitted for vowel duration as well. The modeling outputs are summarized in Table 7. The significant main effects are found for phonemic type and prosodic position. As expected, checked vowels are significantly shorter than unchecked vowels. Furthermore, as demonstrated in Figure 9, vowel duration varies hierarchically with the size of the prosodic domains. The vowels located at the large prosodic boundaries (e.g., IP-final and phrase-final positions) are the longest, followed by those at the sandhi-final position, and the vowels at the sandhi-medial position are the shortest. This is consistent with the final lengthening effect widely reported among languages (Byrd and Saltzman 2003;Pan 2007;Turk and Shattuck-Hufnagel 2007, among others). Furthermore, although there is a significant interaction between phonemic type and prosodic position, significant duration differences between checked and unchecked vowels are maintained in all prosodic positions (p-values from the post-hoc Games-Howell tests are indicated in Figure 9).  9. The variation of duration influenced by phonemic type and prosodic position. Significant p-values (p ⩽ 0.05) are marked in red, which indicates that the duration difference between checked and unchecked syllables is significant in that prosodic position.

Creak Occurrence
Finally, the occurrence of creak was manually coded for all tokens. As reviewed in Section 2.4, we specifically coded three types of creak: coda glottal stops, coda creak, and broader creak. To evaluate the effects of phonemic type and prosodic position on the likelihood of creak occurrence, a logistic mixed-effect regression model was fitted (three types combined), with phonemic type, prosodic position, and their interaction as the fixed factors; the by-speaker and by-following-consonant random intercepts were also included in the model. The fixed factors were again simple-coded. As in the previous phonetic analysis, we included 1061 tokens in the analysis. Because there is no significant interaction effect on creak occurrence, we refit the model without the interaction effect.
As indicated in Table 8, only the prosodic position has a significant main effect on creak occurrence. All prosodic positions have significantly different rates of creak from one another. This result can be better understood in Figure 10. Creak occurs more at the two larger prosodic boundaries (IP-final and phrase-final positions) than at the two smaller prosodic boundaries (sandhi-medial and sandhi-final). The breakdown of the three types of creak is also visualized in Figure 10. Overall, 'coda creak' appears to be the most frequent type, especially for syllables at the larger prosodic boundaries. 'Coda glottal stops' are also quite frequently present at the larger prosodic boundaries. However, 'broader creak' is more likely to occur at the sandhi-medial position.
Overall, aperiodic creak is primarily driven by prosodic boundaries. This result resonates with the PC2 variation pattern we have seen in Principal Component Analysis for Acoustic Measures. PC2, the principal component mostly correlated with noise-ratio measures, also exhibits strong effects of prosodic positions, and larger prosodic boundaries are less periodic than smaller prosodic boundaries.  Figure 10. The distribution of tokens with three different types of creak (coded in non-gray colors) and tokens without visible creak (coded in gray) among checked and unchecked tones at various prosodic positions.
Since there is no significant difference in the occurrence of creak in contemporary Shanghainese between the checked and unchecked syllables, the coda glottal stop, based on the transcription of checked syllables in the older generation of Shanghainese (Xu et al. 1988), is no longer the most appropriate representation of checked syllables in the younger generation of Shanghainese.

Correlation between F0 and Phonation Measures
One remaining question is whether the phonation variation observed in the previous sections is pitch-driven due to the co-variation between pitch and phonation (Chai 2021;Kuang 2017). To explore this question, Pearson product-moment correlation coefficients were computed between f0 and PC1, PC2, CQ, PIC, and creak occurrence (Benesty et al. 2009). Because creak occurrence is a binary variable, we performed the point biserial correlation of Pearson's product-moment correlation when calculating the correlation between f0 and creak occurrence. The correlation coefficients are summarized in Table 9. As shown in Table 9, significant correlations are found between f0 and most phonation measures, including PC1 (spectral slope), PC2 (periodicity/noise ratio), PIC, as well as the creak occurrence. However, there is no significant correlation between f0 and CQ. Therefore, the phonation variation remains relatively independent of f0.

Discussion
This study aimed to examine the phonetic realization of the phonemic contrast between checked vs. unchecked syllables in Shanghainese and how this local coda laryngeal contrast interacts with the global prosodic contexts. To address these questions, comprehensive acoustic and articulatory voice measures were analyzed. A series of regression models were performed to evaluate the main effects and the interaction of phonemic type and prosodic position. Table 10 summarizes the results from the previous sections. Significant effects are indicated with plus signs.

Phonetic Nature of Shanghainese Checked Syllables
In general, as summarized in Table 10, checked syllables in Shanghainese are distinguished from unchecked syllables by having a non-modal phonation, a shorter duration, and a higher f0. However, the occurrence of creak does not differ significantly between checked and unchecked syllables. These results indicate that checked syllables are no longer phonetically distinguished by coda glottal stops; instead, the checkedness contrast is mostly realized as phonation, duration and f0 differences on the vowel portions.
One primary goal of our study is to better understand the phonation variation involved in the checkedness contrast. Both acoustic and articulatory measurements indicate a reliable phonation contrast between checked and unchecked syllables. There are several important aspects of the non-modal phonation involved in the checked syllables. First of all, checked syllables are associated with greater glottal constriction, as indicated by the greater CQ and flatter spectral slope (PC1). However, unlike the prototypical creaky voice (Keating et al. 2015), the non-modal phonation in the checked syllables is highly periodic, as noise-ratio and periodicity measures (e.g., PC2) do not significantly contribute to the contrast. Moreover, the checked syllables are produced with a slightly higher f0. Altogether, the checked syllables in Shanghainese are produced with a tenser phonation. Therefore, we propose to transcribe the checked syllables in contemporary Shanghainese with [CV] to indicate the tense phonation. Moreover, contradictory to the traditional transcription (Xu et al. 1988), we found that the checked syllables have a higher f0 than the corresponding unchecked syllables. This is another major change from the older generation of Shanghainese. The elevation in f0 values is possibly driven by the tense phonation on the vowels (Keating et al. 2015;Kuang 2013;Maddieson and Ladefoged 1985).
Furthermore, there are some unique articulatory details reflected in the EGG measurements. In addition to CQ, PIC also reliably distinguishes the types of syllables in our study. As reviewed earlier in this paper, CQ and PIC reflect different aspects of glottal constriction. Greater CQ indicates a relatively longer contacting phase in each vibratory cycle. PIC is generally related to the status of the vocal folds at the moment of initiating contact. In our case, checked syllables with tenser phonation are associated with larger PIC and positively correlated with PC1 (r = 0.16, p < 0.05). This association is consistent with the notion that abrupt glottal closure boosts the high-frequency energy in the spectrum (Hanson et al. 2001;Stevens 1977). It is useful to note that such significant association was not reported for some other phonation contrasts, such as the tense vs. lax contrast in Yi consonants (Kuang and Keating 2014), the breathy-nasalized vs. non-breathy-nasalized contrast in Yi vowels (Garellek et al. 2016), and the breathy vs. model syllable-onset contrast in Shanghainese . Therefore, the actual articulatory implication of PIC could be rather language-specific and contrast-specific.

Prosodic Effects on the Phonation Variation
In general, the laryngeal functions involved in the prosodic boundaries are quite different from the laryngeal functions involved in the phonemic type. As shown in Figure 2 and summarized in Table 10, unlike the phonemic contrast, the prosodic boundaries are primarily differentiated by the measures associated with periodicity (e.g., PC2 and the occurrence of creak), although the measures related to glottal constriction (e.g., PC1 and CQ) are also relevant.
Importantly, the prosodic positions investigated in our study do not simply co-vary uniformly with the phonetic features; instead, each prosodic position exhibits some unique phonetic properties. Our finding suggests that multiple mechanisms are contributing to the prosodic variation.
On the one hand, there is a rather gradient effect of the phrasal boundaries-larger prosodic boundaries are associated with less vocal constriction (e.g., PC1), less periodicity (e.g., PC2), and more creak occurrence. The IP-final position is the extreme end of this trend. This effect is consistent with the 'non-constricted creak' at the larger prosodic boundaries attested in many languages (Keating et al. 2015;Slifka 2006).
However, on the other hand, tone-sandhi is a distinct prosodic domain from the other levels of prosodic phrases. As shown in Figure 2b, the sandhi-medial position is generally distinguished from all the three sandhi-final positions. Furthermore, as demonstrated by PC1 and PC2, the sandhi-medial position does not participate in the general trend of the gradient creak effect of phrasal boundaries. In addition, the sandhi-final (but phrasemedial) position differs from the phrase-final position in that it has a much lower rate of creak. Overall, these findings suggest that tone-sandhi domains and prosodic phrases influence the phonation variation differently. This may be because the tone-sandhi domain in Shanghainese is specified by morphology and phonology, which is planned differently from the higher levels of prosodic domains (Roberts 2020;Selkirk and Shen 1990).

Interaction between the Global vs. Local Laryngeal Functions
Based on the discussions from the previous sections, it is clear that the global prosodic effects and the local phonemic contrast are subject to different voice mechanisms, as they involve distinct phonation cues. In particular, periodicity is only related to the global prosodic boundaries but not the local checkedness contrast.
Moreover, as summarized in Table 10, there is no interaction between the global vs. local functions for acoustic features. Interactions were only found for individual EGG measures. For the sandhi-medial and IP-final positions, the phonemic contrast is well distinguished by both CQ and PIC, but for the sandhi-final and phrase-final positions, the phonemic contrast is only distinguished by PIC. Taking all the phonetic measures into consideration, the phonemic contrast between checked and unchecked syllables is essentially well-maintained in all prosodic contexts. Therefore, speakers of Shanghainese manage to control the global vs. local levels of phonation variation rather independently.

Conclusions
This study investigated the phonetic realization of the checked vs. unchecked contrast at various prosodic positions in Shanghainese. By extensively analyzing both acoustic and articulatory measures, we found that the Shanghainese checked codas are realized with a tenser phonation, a higher f0, and a shorter duration in the vowel segment. This study also provides us with a better understanding of the prosodic hierarchies of Shanghainese. The tone-sandhi domain, which is specified by phonology and morphology, is not subject to the general mechanism of prosodic phrases. Moreover, this study achieved a comprehensive understanding of the range of phonation variation as the function of the interaction between the global laryngeal function (the prosodic boundaries) and local phonemic contrast (the checked vs. unchecked contrast) in Shanghainese. Overall, the laryngeal functions of the global prosodic contexts can be rather independent of that of the local phonemic contrast, and the local phonemic contrast can be well maintained in all prosodic contexts.
Funding: This research received no external funding.
Institutional Review Board Statement: Ethical review and approval were waived for this study, as it was not related to bio-medical and health research.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The raw data is available with the first author and can be made available for non-profit, academic and research purposes.