Fundamental Frequency and Phonation Differences in the Production of Stop Laryngeal Contrasts of Endangered Shina

Shina is an endangered Indo-Aryan (Dardic) language spoken in Gilgit, Northern Pakistan. The present study investigates the acoustic correlates of Shina’s three-way stop laryngeal contrast across five places of articulation. A wide range of acoustic correlates were measured including fundamental frequency (F0), spectral tilt (H1*-H2*, H1*-A1*, H1*-A2*, and H1*-A3*), and cepstral peak prominence (CPP). Voiceless aspirated stops were characterized by higher fundamental frequency, spectral tilt, and cepstral peak prominence, compared to voiceless unaspirated and voiced unaspirated stops. These results suggest that Shina is among those languages which have a raising effect of aspiration on the pitch and spectral tilt onsets of the following vowels. Positive correlations among fundamental frequency, spectral tilt, and cepstral peak prominence were observed. The findings of this study will contribute to the phonetic documentation of endangered Dardic languages.


Introduction
Shina is an endangered Indo-Aryan (Dardic) language spoken in Gilgit, Northern Pakistan. There is a three-way laryngeal contrast in Shina stops (e.g., voiceless unaspirated /p/, voiceless aspirated /p h /, and voiced unaspirated /b/) at five places of articulation (bilabial, dental, retroflex, palatal, and velar;Hussain 2018;Radloff 1999). From a typological perspective, voiceless aspirated stops have been described to have either a raising effect on fundamental frequency (F0) and spectral tilt or a lowering effect (see below). A wide range of studies have been conducted on the grammatical, phonological, and sociolinguistic aspects of Dardic languages (Backstrom and Radloff 1992;Cacopardo and Cacopardo 2001;Liljegren 2017). However, Shina still lacks a detailed phonetic description. The current study investigates F0, spectral tilt (H1*-H2*, H1*-A1*, H1*-A2*, and H1*-A3*), and cepstral peak prominence (CPP) of the rich laryngeal and place contrasts of Shina. In particular, I investigate whether aspirated stops have a raising effect on F0 and spectral tilt onsets. Moreover, relationships among all the acoustic measures are explored using correlations.
The paper is organized as follows. Section 1.1 presents a brief typological survey of the laryngeal contrasts found in Dardic languages. Section 1.2 discusses the acoustic correlates of laryngeal and place contrasts. Section 2 outlines the methods and procedures used in the current study. The main findings are presented in Section 3. In the end, Section 4 summarizes the results and highlights the implications for the general literature on laryngeal typology, tonogenesis, and phonationgenesis.

Typology of Laryngeal Contrasts in Dardic Languages
The term Dardic is generally used to refer to a group of Indo-Aryan languages spoken in the Northwestern regions of South Asia (spanning the border areas of Afghanistan, Pakistan, and India, including both the Pakistani-and Indian-administered areas of the

Acoustic Correlates of Laryngeal Contrasts
In the current phonetic literature, a number of acoustic measures have been used to investigate the laryngeal contrasts. Fundamental frequency (F0) of the following vowel onsets has been extensively used to examine the laryngeal contrasts (Chen 2011;Hombert et al. 1979;House and Fairbanks 1953;Kirby and Ladd 2016;Ladd and Schmid 2018;Lee et al. 2019). Voiced stops generally exhibit lower F0 onsets than voiceless stops (Hombert et al. 1979;Schertz and Khan 2020). Shimizu (1989) conducted a cross-linguistic study and noted that voiced stops of Japanese, Burmese, Thai, and Hindi are characterized by lower F0 onsets than voiceless stops. These patterns can be observed in Figure 2 which presents the F0 onsets of the three laryngeal categories of Burmese and Thai. 3 Some studies have also observed the effect of aspiration on F0 onsets but there appears to be inconsistency in the reported results. Voiceless unaspirated stops of Burushaski (Hussain 2021), Cantonese, Mandarin (Luo et al. 2016), Kalasha (Hussain and Mielke 2020), Marathi (Dmitrieva and Dutta 2020), and Shanghai Chinese (Chen 2011) showed higher F0 onsets than voiceless aspirated stops. In contrast, voiceless unaspirated stops of Chru (Brunelle et al. 2020), Madurese (Misnadin 2016), and Thai (Shimizu 1989) entailed lower F0 onsets than voiceless aspirated stops. Dzongkha's voiceless unaspirated and aspirated categories showed high F0 onsets, indicating no clear differences (Lee and Kawahara 2018). Ohde (1984) also showed that voiceless unaspirated and voiceless aspirated stops of English are not distinguished by F0 onsets. If we look at the typology of obstruent voicing and its effects on F0 onsets, three general patterns emerge: (a) languages which have a raising effect on the F0 onsets of voiceless aspirated stops (Chru, Madurese, and Thai); (b) languages which have a lowering effect on the F0 onsets of voiceless aspirated stops (Burushaski, Cantonese, Hindi, Kalasha, Mandarin, Marathi, and Shanghai Chinese); and (c) languages which show no clear or large differences in the F0 onsets of voiceless unaspirated and aspirated stops (Burmese, Dzongkha, and English; see also Kirby and Hyslop 2019 which showed raising effects on the F0 onsets of voiceless aspirated stops in male speakers of Dzongkha whereas female speakers showed the opposite pattern). It should also be noted that F0 onsets are generally used to investigate the laryngeal contrasts. Lai et al. (2009) showed minor but significant effects of place on F0 onsets in Taiwanese. It is not clear which articulatory forces may be involved and resulted in these place effects. It is also worth noting that the apparent differences in F0 may arise due to the number of laryngeal and place categories found in a language, and whether a language has undergone the processes of tonogenesis and registrogenesis, and the type of pitch and/or focus contrasts Tan et al. 2019;Wayland and Jongman 2002).
Spectral tilt/balance has also been used to investigate voicing and aspiration contrasts across diverse languages Garellek 2019;Iseli et al. 2007). There are four spectral tilt measures: (a) H1*-H2* (the amplitude difference between the first [H1] and second [H2] harmonic); (b) H1*-A1* (the amplitude difference between the first harmonic [H1] to the amplitude of F1); (c) H1*-A2* (the amplitude difference between the first harmonic [H1] to the amplitude of F2); and (d) H1*-A3* (the amplitude difference between the first harmonic [H1] to the amplitude of F3) (Garellek 2019). Aspirated stops have higher spectral tilt at the onset of following vowels than non-aspirated stops Esposito and Khan 2012;Gao et al. 2020;Seyfarth and Garellek 2018). For example, in Dzongkha (Kirby and Hyslop 2019), Hindi/Urdu (Schertz and Khan 2020), Gujarati , and Marathi (Berkson 2019; Dmitrieva and Dutta 2020), aspirated stops have been reported to show higher H1*-H2* and H1*-A1* at the onset of following vowels than voiceless unaspirated stops. In terms of voicing, there were no clear differences in H1*-H2* onsets of voiceless unaspirated and voiced unaspirated stops in Hindi/Urdu (Schertz and Khan 2020). Voiced unaspirated and voiceless aspirated stops of Yerevan Armenian were characterized by identical H1*-H2* (Seyfarth and Garellek 2018). In addition to spectral tilt, cepstral peak prominence (CPP), which captures any form of noise in an acoustic signal (breathiness and/or aspiration), has been reported to distinguish modal and breathy vowels (Garellek and Keating 2011). Since breathiness results in less distinct harmonics, the CPP values are expected to be much lower in vowels following aspirated stops (Garellek 2019;Misnadin 2016).
A number of studies have reported correlations among F0 and spectral tilt. Iseli et al. (2007) observed positive correlations between H1*-H2* and F0 for one group of talkers (older males). Koreman (1996) noted that an increase in the tension of the cricothyroid muscle in the larynx results in an increase of F0 and H1*-H2* (see also Kreiman et al. 2012 andEsposito 2007 for further discussion). The results of these studies suggest that there may be correlations among F0 and spectral tilt across laryngeal categories which differ on the dimensions of aspiration and/or voicing.

Speakers
Five Shina speakers (range: 18-32 years; mean age: 24.6 years) were recruited from Gilgit, Northern Pakistan. All the speakers were healthy adults and had no speech and/or hearing disabilities. The sample was homogeneous. The speakers learned Shina as their first language and could also speak Urdu, a national language and lingua franca of Pakistan. All the speakers were screened before including in the study. Only those speakers who regularly spoke Shina with their family and friends were selected. Speakers who were predominantly Urdu speakers (i.e., most of the time they used Urdu) were not included in the study.

Speech Materials and Recording Procedure
A wordlist of Shina was created, where word-initial C represented a stop consonant (/p p h b t " t " h d " ú ú h ã Ù Ù h dZ k k h g/), followed by vowel /a/ (each word had a CV syllable structure; see Table A1 in Appendix A). As the quality of vowels can affect fundamental frequency and other acoustic measures, nonsense words were created to better control for the phonological environment and vowel contexts. The nonsense wordlist was created in consultation with the native Shina speakers. The speakers were asked whether the selected words occur in their language or have they used any of these words in their everyday conversations. All the target words were balanced across consonant types and places of articulation. The target stops were investigated in word-initial position as this is the only position where full set of laryngeal and place contrasts are found. Before the start of the recording, speakers were familiarized with the wordlist and were asked to produce the words as they are speaking Shina using their natural speech tempo. A portable Zoom digital voice recorder with a built-in microphone was placed near the speakers to make audio recordings (44.1 kHz, encoded in 16-bit; recorder was placed around 1.5 feet from the speakers' mouth). It was made sure that there was no background noise during the recordings. All the words were produced in citation form. Each word was repeated five times (5 speakers × 15 words × 5 repetitions of each word = 375 tokens).

Acoustic and Statistical Analyses
A total of 375 tokens were segmented in Praat (Boersma and Weenink 2007; 11 tokens were excluded due to mispronunciations and missing repetitions; 364 tokens were used for the final analysis). Fundamental frequency, spectral tilt (H1*-H2*, H1*-A1*, H1*-A2*, and H1*-A3*; asterisks indicate that the measurements were corrected for the boosting effects of formants; Iseli et al. 2007), and CPP were measured from the onset of following vowels, using PraatSauce (Kirby 2018b). The most important information about a laryngeal category and place of articulation is found right after the release of a consonant (i.e., near the onset or beginning of the following vowel). Therefore, all the measurements are based on the onset which refers to a single time point at the beginning of the vowel /a/.
The statistical analyses of the data were conducted in R (R Core Team 2013). Separate Linear Mixed Effects Regression (LMER) models were performed for all the acoustic measures, using the lme4 package (Bates et al. 2015). The lmerTest package was used to obtain p values for all the LMER models with Satterthwaite approximations for degrees of freedom (Kuznetsova et al. 2017). In all the LMER models, Laryngeal (voiceless unaspirated, voiceless aspirated, and voiced unaspirated) and Place (bilabial, dental, retroflex, palatal, and velar) were included as fixed factors, and Speaker as a random factor (alpha value: p = 0.05). By-speaker random slopes for Laryngeal and Place were included in all the models. Pairwise comparisons were performed using the emmeans package (Lenth 2016). The correlation networks were plotted using the Gaussian Graphical Model (Bhushan et al. 2019;Epskamp et al. 2018).

Correlations
Correlations among F0, H1*-H2*, H1*-A1*, H1*-A2*, H1*-A3*, and CPP were examined using the Gaussian Graphical Model (GGM) (Bhushan et al. 2019;Epskamp et al. 2018). The advantage of GGM correlation networks is their ease of interpretation and visualization. In Figure 5, the thickness and colour shading of the lines indicate the strength of the relationships among acoustic variables. It can be observed that the four spectral tilt measures were highly correlated with F0. Table 3 shows results of the correlation models.

Discussion
The current study investigated the F0, spectral tilt, and CPP of the three-way laryngeal contrast across five places of articulation in Shina, an endangered Indo-Aryan (Dardic) language spoken in Gilgit, Northern Pakistan. The findings indicated that Shina speakers may rely on a number of acoustic correlates to tease apart the laryngeal and place contrasts. F0 onsets were higher in the voiceless aspirated stops than voiceless unaspirated and voiced unaspirated stops. This suggests that Shina is among those languages where aspiration has a raising effect on the F0 onsets of the following vowels (Chru: Brunelle et al. 2020;Khmer, Central Thai, and Northern Vietnamese: Kirby 2018a;Madurese: Misnadin 2016: Thai: Shimizu 1989 but it differs from other languages where voiceless unaspirated stops raise the F0 onsets (Burushaski: Hussain 2021; Burmese: Shimizu 1989; Cantonese: Luo et al. 2016;Hindi: Shimizu 1989;Kalasha: Hussain and Mielke 2020;Mandarin: Luo et al. 2016;Marathi: Dmitrieva and Dutta 2020;Shanghai Chinese: Luo et al. 2016). Voiceless aspirated stops are produced with a wider opening of the vocal cords so that the air can pass through without any obstruction (Kim et al. 2010). Thus, a wider opening during the aspiration/frication intervals of voiceless aspirated stops leads to the raising of F0 onsets on the following vowels. We also noted significant effects of place on F0 onsets in Shina. It needs to be further investigated how different vocal tract configurations affect F0 onsets (see discussion below).
Voiceless aspirated stops were characterized by the highest H1*-H2*, H1*-A1*, H1*-A2*, and H1*-A3* onsets compared to other two laryngeal categories. These results are in line with other languages of the world where aspirated stops have been shown as having a higher spectral tilt at the onset of the following vowels than unaspirated stops (Shanghai  Dmitrieva and Dutta 2020). Raising of the spectral tilt due to aspiration appears to be a universal correlate. There are no known languages where we observe that aspiration lowers the spectral tilt. Voiced stops of Shina were consistently characterized by the lowest spectral tilt which also appears to be consistent across languages. The results of CPP were surprising. Breathiness/aspiration causes less distinct harmonics in the acoustic signal which lowers the CPP values (Garellek 2019). However, in Shina's voiceless aspirated stops, CPP values were higher than voiceless unaspirated and voiced unaspirated stops. These results differ from Madurese (Misnadin 2016) and Yerevan Armenian (Seyfarth and Garellek 2018) which showed low CPP in aspirated stops.
F0, spectral tilt, and CPP measures have been primarily used for investigating the laryngeal contrasts (Kirby 2018a;Kirby and Hyslop 2019;Lee and Kawahara 2018;Seyfarth and Garellek 2018). However, their usage for the characterization of rich place contrasts is not widely explored (Hussain 2021;Misnadin and Kirby 2020). Lai et al. (2009) observed minor but significant effects of place on F0 onsets in Taiwanese. The laryngeal distinctions are mainly achieved by vibrating (opening/closing) of the vocal cords. Thus, all the spectral tilt measures are sensitive to these subtle movements. In contrast, different places of articulation are achieved by making constrictions in the vocal tract (there are some places of articulation that involve constrictions in the pharyngeal cavity, e.g., pharyngeal consonants, glottal stops). It still needs to be explored how different vocal tract configurations (which make the front and back cavities either long or short, e.g., larger sublingual cavity in retroflexes) affect the vocal cord vibrations. In the current study, there appeared to be minor place differences in F0 and spectral tilt. However, at most of the places of articulation, voiceless aspirated stops were characterized by the highest F0 and spectral tilt. This suggests that the F0 and spectral tilt rankings are consistent across laryngeal categories and places of articulation.
Significant correlations were observed among F0 and all the other acoustic measures. These findings corroborate other studies which reported correlations among F0 and spectral tilt (Iseli et al. 2007). Koreman (1996) found that higher F0 and H1*-H2* may occur due to the increasing tension in the cricothyroid muscle in the larynx (see also Kreiman et al. 2012). These correlations may also help inform the emergence of tones (tonogenesis) and phonation (phonationgenesis) contrasts in the world's languages. A wide range of Sino-Tibetan and Austronesian languages contrast tones on the dimension of phonation (checked, creaky, high, and low tones in Burmese; Gruber 2011). The development of phonation and tonal contrasts in different varieties of Kammu has been explored using comparative lexical data. For example, one Western Kammu variety has low tone (/pù:c/ 'rice wine') where another has breathiness (/pu: c/), and both stem from the voicing still seen in Eastern Kammu (/bu:c/) (Kingston 2011). This indicates that the development of tonal contrasts may occur together with the development of phonation contrasts (see discussion in Garellek and Esposito 2021;Gruber 2011;Ratliff 2015).
In a typological review,  grouped languages into four categories: (a) languages which do not use phonation as a cue for tone (Japanese, Navajo, Punjabi, Manange, most West African languages, Swedish, and Central Thai); (b) languages which use phonation as a cue for tone (Cantonese Yue, Khmu' Rawk, Mandarin, Pakphanang Thai, Ph. Penh Khmer, and Yueyang Xiang); (c) languages where phonation and tone are orthogonal (Dinka, Mazatec languages, Mpi, Yalálag Zapotec, and Yi languages); and (d) languages which fuse the phonation and tone (Black Miao, Burmese, Green Mong, most Zapotec languages, Tamang, Vietnamese, White Hmong, and Wu). Thus, in some languages, there may be a strong relationship among F0 and the phonation measures. Shina uses both F0 and phonation to distinguish the three-way laryngeal contrast which, at some point, may also contribute to the development of phonemic tones.
The emergence of tones in a language can be linked to the type of laryngeal categories. The onset-based tonogenesis more often arises from the voiced unaspirated category going its own path, apart from all voiceless categories, or the voiced aspirated category going its own path. The effect of voiced (un)aspirated stops has been widely acknowledged in the tonogenesis literature (Hombert et al. 1979;Ratliff 2015) but frication can also play a key role. For example, glottal fricative /h/ in Punjabi led to the emergence of high tone (Urdu /kehna/ vs. Punjabi /kéna/ 'say';Hussain 2020;Hussain et al. 2019). Voiceless aspirated stops involve two acoustic/articulatory events: stop closure + aspiration/frication of /h/. Fricatives, in general, have a raising effect on F0 (Michaud and Sands 2020;Pearce 2007). Moreover, fricative /h/ and the aspiration interval of voiceless aspirated stops share the articulatory feature associated with the vocal cords (e.g., both have a wider opening of the vocal cords). Thus, we can assume that at some point, voiceless aspirated stops will have the same effect on F0 as found in glottal fricative /h/ (i.e., raise the F0 and then lead to the development of high tone). In other words, if frication of /h/ can result in high tone in Punjabi, then the aspiration/frication in voiceless aspirated stops can also clear path for the development of tones. It appears that breathiness caused by the aspiration of the word-initial consonants (as noted in Shina) may be the first step in the development of contrastive tones and/or phonation. Languages may transfer the aspiration feature to the following vowels and may either develop phonemic tones (Punjabi: Hussain et al. 2019), breathy vowels (Gujarati: Esposito and Khan 2012), or both (Western Kammu varieties: Kingston 2011). Dardic languages perhaps belong to the type (a) languages discussed by . As phonetic data of Dardic languages are still scarce, a cross-linguistic acoustic/articulatory study is needed to further investigate the relationships among F0, phonation, and width of the vocal cords during the production of various laryngeal categories and places of articulation. Acoustic-phonetic data would also be needed to investigate tonogenesis and phonationgenesis in Shina and neighbouring Dardic languages.
The findings of this study will contribute to the phonetic documentation of endangered Dardic languages. The study also has some limitations. Acoustic data were collected from five male speakers of Shina. Future studies with a larger sample of speakers (both males and females), different age groups (younger, middle-aged, and older), and dialects of Shina (Astor, Gilgit, and Gurez) can complement the findings of the current study. Articulatory experiments using electroglottography (EGG) can also help inform the correlations among various acoustic and articulatory measures. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The author declares no conflict of interest.  Kashmiri also contrasts palatalized stops in addition to plain stops (Koul 2003;Wali and Koul 1997). 3 These two languages are shown here because of their relevance for the three laryngeal categories of Shina.