Checked Syllables, Checked Tones, and Tone Sandhi in Xiapu Min

: A “checked” syllable usually refers to one with a short vowel and an oral or glottal coda, which results impressionistically in a “short” and “abrupt” quality. Although common in languages of the world, it is unclear how to characterize checked syllables phonetically. In this study, we investigated the acoustic features of checked syllables in citation and sandhi forms in Xiapu Min, an under-documented language from China. We conducted a production experiment and analyzed the F0, phonatory quality, vowel duration, and vowel quality in checked syllables. The results show that, in citation tones, checked syllables are realized with distinct F0 contours from unchecked syllables, along with glottalization in the end and a shorter duration overall. In sandhi tones, checked syllables lose their distinct F0 contours and become less glottalized. However, the shorter duration of checked syllables is retained in sandhi forms. This study lays out the acoustic properties that tend to be associated with checked syllables and can be used when testing checked syllables in other language varieties. The fact that in Xiapu Min sandhi checked tones become less glottalized but preserve their shorter duration suggests that, when checked syllables become unchecked diachronically, glottalization might be lost prior to duration lengthening. T2, sandhi forms, T23 T44 neutralized, T2 and T23, and T2 and T44, that T23 and T44 one minimal pair, T2 vs. T23 T2 T44 and pairs, Min differ from unchecked tones, in terms of their F0 height and contour, phonatory quality, duration, and vowel quality. These parameters are the ones most often associated with checkedness across languages. The results show that in citation forms, checked tones in Xiapu Min differ from unchecked tones in three out of four dimensions. We conﬁrm that the two checked


Introduction
Xiapu (霞浦) Min is a variety of Eastern Min spoken in Xiapu County (Ningde, Fujian), China (see Figure 1 for the map). There have been 475,936 residents living in Xiapu County in 2021 (Xiapu County Bureau of Statistics 2021). Checked syllables in Xiapu Min are syllables that are closed by a glottal stop and carry specific tones. Xiapu Min has seven lexical tones, two of which are associated with checked syllables and will be referred to as "checked tones". They are high-falling-checked T54 (in Chao numerals, Chao 1930) and low-falling-checked T21. The other five tones are associated with unchecked syllables and will be referred to as "unchecked tones". They are high-level T44, low-level T11, mid-and high-rising T23, 35, and falling T42 (Wen 2015). Figure 2 shows the F0 contour of /θi/ in seven tones in Xiapu Min produced by a female native speaker. We will henceforth refer to the high-falling-checked and low-falling-checked tones using one numeral, as T5 and T2, to distinguish them from unchecked tones. The goals of the paper are to summarize the acoustic characteristics of checked and unchecked tones cross-linguistically and test whether those characteristics apply to the checked tones of Xiapu Min; and to determine whether the contrast between checked vs. unchecked tones is neutralized in sandhi forms. The term "checked" has heterogeneous definitions among different languages of different families. We focus on checked syllables that are closed with an oral or a glottal stop and that also form a prosodic opposition against unchecked syllables. In Xiapu Min, two lexical tones are restricted to Vʔ syllables, whereas the other five lexical tones are restricted to V and VN syllables. This is the evidence that Vʔ syllables are in prosodic opposition to V and VN syllables. Note that we consider that Xiapu Min has Vʔ checked syllables associated with checked tones, rather than having glottalized tones realized on V syllables, because there is phonological evidence that the glottal stop is a segment in the language. In Xiapu Min, Vʔ syllables contrast with V and VN syllables in onset change pattern in disyllabic compounds. The phonological rules and examples are presented in Table 1. In a disyllabic compound, the /t/ onset of the second syllable becomes [ ] when it follows a ɾ V syllable (a), [n] following a VN syllable (b), and remains [t] when it follows a Vʔ syllable (c). This phenomenon has also been reported in Wen (2015). Table 1 Examples (a)-(c) The term "checked" has heterogeneous definitions among different languages of different families. We focus on checked syllables that are closed with an oral or a glottal stop and that also form a prosodic opposition against unchecked syllables. In Xiapu Min, two lexical tones are restricted to Vʔ syllables, whereas the other five lexical tones are restricted to V and VN syllables. This is the evidence that Vʔ syllables are in prosodic opposition to V and VN syllables. Note that we consider that Xiapu Min has Vʔ checked syllables associated with checked tones, rather than having glottalized tones realized on V syllables, because there is phonological evidence that the glottal stop is a segment in the language. In Xiapu Min, Vʔ syllables contrast with V and VN syllables in onset change pattern in disyllabic compounds. The phonological rules and examples are presented in Table 1. In a disyllabic compound, the /t/ onset of the second syllable becomes [ ] when it follows a ɾ V syllable (a), [n] following a VN syllable (b), and remains [t] when it follows a Vʔ syllable (c). This phenomenon has also been reported in Wen (2015). Table 1 Examples (a)-(c) The term "checked" has heterogeneous definitions among different languages of different families. We focus on checked syllables that are closed with an oral or a glottal stop and that also form a prosodic opposition against unchecked syllables. In Xiapu Min, two lexical tones are restricted to VP syllables, whereas the other five lexical tones are restricted to V and VN syllables. This is the evidence that VP syllables are in prosodic opposition to V and VN syllables. Note that we consider that Xiapu Min has VP checked syllables associated with checked tones, rather than having glottalized tones realized on V syllables, because there is phonological evidence that the glottal stop is a segment in the language. In Xiapu Min, VP syllables contrast with V and VN syllables in onset change pattern in disyllabic compounds. The phonological rules and examples are presented in Table 1. In a disyllabic compound, the /t/ onset of the second syllable becomes [R] when it follows a V syllable (a), [n] following a VN syllable (b), and remains [t] when it follows a VP syllable (c). This phenomenon has also been reported in Wen (2015). Table 1 Examples (a)-(c) demonstrate a contrast among V, VN, and VP in phonological transformations, indicating Languages 2022, 7, 47 3 of 27 that VP is a contrastive syllable type in Xiapu Min. With the evidence of tone restriction and onset change pattern, VP in Xiapu Min fits our definition of "checked syllable" that will be used in this paper. Such checked syllables are widely reported in Chinese languages: Taiwanese Min (Kuo 2013;Pan 2017); Yun'ao Min (Zhang 2020); Xiamen Min (Lai 2016); Shanghai Wu (Zee and Maddieson 1979); Nanjing Jianghuai Mandarin (Sun 2003;Yang and Chen 2018); and Meixian Hakka (Shao 2012 * Note that the phenomenon in Table 1 is undergoing change and loss. For example, Wen (2015) found that these onset change rules apply to high-frequency colloquial words, but not to the words used in a formal register.
In other languages, "checkedness" is not associated with syllables exclusively. It can also be a type of phonation associated with vowels. The status of the glottal stop determines whether checkedness is a phonation or a syllable type. For example, the glottal stop in VP syllable of Texmelucan Zapotec is not a phoneme, but a phonation type. The evidence is that when adding possessive person marker to a noun (Table 2), nouns with a VP syllable (d) behave the same as nouns with a V syllable (e), but differently from nouns with a VC syllable (f) (Speck 1978 (Speck 1978, p. 14).
In order to capture the phonetic features of checkedness to the largest extent, this paper reviews the acoustic correlates of all kinds of purported checked syllables and vowels in the literature. Whether checked syllables and checked vowels can differ in their phonetic realization is beyond the scope of the current paper and will not be further discussed. Then, we test how the phonetic correlates found in the literature behave in Xiapu Min checked syllables and tones, which provides the first acoustic analysis of Xiapu Min tone system. The data from Xiapu Min, an under-documented and under-studied language, will in return help clarify the phonetic nature of the checked syllables and vowels in general. Four parameters have been found to distinguish checked syllables and vowels from unchecked ones: F0, duration, voice quality, and vowel quality. In Section 2, we review how checked and unchecked syllables and vowels differ in each of those four parameters in the literature.

F0
Many Chinese languages have checked tones associated with checked syllables, which have a distinct F0 from unchecked tones. We surveyed eleven varieties of Chinese languages that have distinct F0 for checked tones from unchecked ones. The tonal values were supported by the F0 measurement of tones in the references. The languages include: Nanjing (Chen and Wiltshire 2013;Yang and Chen 2018), Anqing, Wuhu, and East Hefei (Tang 2014) varieties of Jianghuai Mandarin; Meixian Hakka (Shao 2012); Chishan (Liu 2013) and Changsha (Li and Liu 2006;Shao 2012)  Taiyuan Jin (Jia 2013). A summary of the tone system of each variety is presented in Table S1 in Supplementary Material S3.
Apart from Chinese languages, there are other languages that were reported to have checked tones. In Burmese, the checked tone is associated with checked syllables closed by a glottal stop and has a high-sharp-falling F0 contour that is distinct from the three other tones (Gruber 2011). White Hmong is reported to have a checked tone that is closed by a glottal stop (Huffman 1987;Ratliff 2010). Esposito (2012), Garellek et al. (2013), and Garellek and Esposito (2021) called the checked tone a "creaky" tone and analyzed the glottalization at the end of the vowel as a suprasegmental creaky phonation (Ṽ ). Nonetheless, the checked/creaky tone in White Hmong is reported to have a distinct mid/low-falling F0 contour from the other six tones in the language. A perception study has shown that the low-falling pitch contour is an essential cue for a tone to be identified as checked/creaky in White Hmong (Garellek et al. 2013).
There are also languages in which checked syllables do not carry a distinct pitch contour from unchecked syllables. Cantonese checked tones have the same tonal value as the unchecked level tones (55,33,22) (Chan 1987;Qin and Mok 2014). Two out of five tones in Nantong Chinese can be carried by both checked and unchecked syllables (Ao 1993). The checked phonation in San Melchor Betaza Zapotec (Teodocio Olivares 2009) and Isthmus Zapotec (Pickett et al. 2010) is not tone-dependent. Checked and unchecked vowels can be associated with the same set of tones in these languages.

Duration
Checked syllables and vowels have been found to be shorter than unchecked ones in a number of languages: Nanjing (Yang and Chen 2018;Wu 2018), Hefei, Nantong (Ao 1993;Wu 2018), Anqing, and Wuhu (Tang 2014) Jianghuai Mandarin; Meixian Hakka (Shao 2012); Fuzhou (Shao 2012), Xiamen (Lai 2016), and Taiwanese Min (Lin and Repp 1989;Kuo 2013); South Taiyuan Jin (Jia 2013); Burmese (Gruber 2011). The White Hmong checked/creaky tone has a shorter duration than other tones in general (Esposito 2012), but the duration is dependent on vowel quality (Garellek and Esposito 2021). In some languages, there is a length difference between their two checked tones. In Meixian Hakka, checked T5 is shorter than checked T32. In Fuzhou Min, checked T5 is shorter than checked T24 (Shao 2012). Short duration has also been found to be a salient cue for checked tone identification. Controlling for other parameters, tokens with a shorter duration elicit more checked tone responses in Burmese (Gruber 2011) and White Hmong (Garellek et al. 2013).
However, checkedness is not always associated with shorter duration. Checked tones in Changsha Xiang (Shao 2012), Chishan Xiang (Liu 2013), and Anqing Jianghuai Mandarin (Tang 2014) have a longer duration than unchecked tones. Hong Kong Cantonese has both short checked and long checked tones. The long checked tone has a similar duration as the unchecked tones in the language (269 vs. 284 ms) (Zhu et al. 2008). Checked vowels in San Melchor Betaza Zapotec are usually longer than short modal vowels in open syllables (Teodocio Olivares 2009).

Quality of Phonation
Several studies reported that checked tones were realized with non-modal phonations. Differences in phonatory quality have been represented by the articulatory measurements of glottal opening (e.g., Open Quotient measured from electroglottography : Esposito 2012;Gruber 2011;Pan 2017;Pan et al. 2016;Shao 2012;Tang 2014) or the acoustic correlates of glottal constriction and voicing periodicity (e.g., spectral tilt, HNR, CPP, and jitter and shimmer) (Esposito 2012;Garellek and Esposito 2021;Gruber 2011;Kuo 2013;Pan 2017;Pan et al. 2016;Shao 2012). Shao (2012) found that checked T5 in Meixian Hakka and Fuzhou Min was more glottalized than unchecked tones, whereas checked T32 in Meixian Hakka and checked T24 in Fuzhou Min were realized with a breathy voice. Checked T5 in East Hefei Jianghuai Mandarin was more glottalized than unchecked ones, especially at the end of vowels (Tang 2014). Burmese checked tone (Gruber 2011) and White Hmong checked/creaky tone (Esposito 2012;Garellek and Esposito 2021) were reported to have a more glottalized quality than unchecked tones based on acoustic measurements.
There are also checked syllables and vowels that are not associated with a glottalized quality. Checked syllables in Nanjing Jianghuai Mandarin are found to be frequently produced without a glottal coda. Vowels in checked syllables have lower jitter values than those in unchecked syllables, suggesting that checked syllables are more modal than unchecked ones (Oakden 2017). Checked tones in Taiwanese Min are not consistently glottalized (Pan 2005(Pan , 2017. The /p, t, k, P/ codas in checked syllables are frequently deleted in natural production, resulting in vowel lengthening and an increase in voicing periodicity (Pan et al. 2016;Pan and Lyu 2021). Checked tones in Changsha Xiang (Shao 2012) and Anqing and Wuhu Jianghuai Mandarin (Tang 2014) have a similar degree of glottal opening as unchecked tones, indicating a modal quality. Finally, for Vietnamese, vowels with checked tones (D1: rising checked; D2: low checked) have less glottalization than vowels in unchecked syllables carrying unchecked tones (B1: rising unchecked; B2: low glottalized unchecked) (Michaud 2004).

Vowel Quality
Yang and Chen (2018) report that for Nanjing Jianghuai Mandarin, compared with vowels in unchecked syllables, vowels in checked syllables have a higher F1 for /e, o, i, u, y/, higher F2 for /e, o, u/ in older generation, and higher F3 for /o, u, y/. They observe that the vowel quality difference is due to the glottal constriction gesture at the end of the vowel. The glottal constriction causes jaw lowering and consequently a lower and fronter tongue position, resulting in formant value raising. Wu (2018) measured the vowel space in Nanjing, Nantong, and Hefei Jianghuai Mandarin, and found that the F1 of vowels in checked syllables is higher than that of unchecked ones. Front and back vowels are more concentrated in the middle position on the F2 scale. In Taiyuan Jin, the number of vowel contrasts is reduced in checked syllables compared with unchecked syllables (Xia and Hu 2016). While V and VN syllables allow six and five vowel contrasts, respectively, VP syllables only allow two central vowels /5, @/.
In Table 3, we summarize the phonetic features of the checked constituents in all the languages mentioned in Sections 2.1-2.4. Their phonetic nature is represented by four dimensions: whether the checked constituents have a distinct pitch contour, a shorter duration, a non-modal voice quality, and/or a different vowel quality from the unchecked constituents. Table 3. Phonetic features of selected languages with checked constituents (Y for "Yes"; N for "No"; NA if a feature was not described in the reference.).

Acoustic Correlates of Checked Tones in Sandhi Forms
Compared to the phonetics of checked tones in citation forms, the phonetic properties of checked tones in sandhi forms are less studied. Pan (2017) studied the voice quality of checked tones in sandhi in Taiwanese Min. In Taiwanese Min, a tone undergoes sandhi when it is in the middle of a phrase but preserves its citation form in phrase-final position. The low checked tone /3/ is realized as [5] after sandhi, while the high checked tone /5/ is realized as [3]. Pan (2017) found that 80% of /P/ codas in checked tones are deleted in both sandhi and citation forms. Codas of /p, t, k/ are more frequently lenited to [P] in sandhi forms than in citation forms. The high checked tone [5] is produced with less vocal fold contact than the unchecked high-falling tone [51] in both citation and sandhi forms. The low checked tone [3] is produced with less vocal fold contact but a noisier quality than the unchecked low-falling tone [31] in sandhi forms (Pan 2017 Checked tones in Xiapu Min also undergo tone sandhi in specific environments. While Pan (2017) focused on the voice quality of sandhi checked tones, and Chien and Jongman (2019) focused on the F0 of sandhi checked tones, we will investigate three features that have been found to be related to checked syllables and vowels-F0, duration, and phonatory quality. Another significance of studying the phonetic feature of checked tones in sandhi forms in Xiapu Min is that it can demonstrate the phonetic consequences when checked tones change into unchecked ones. Taiwanese Min checked tones remain phonologically checked after sandhi. In contrast, Xiapu Min checked tones acquire the same F0 target as unchecked tones after sandhi, indicating that checked and unchecked tones and syllables are possibly neutralized. Table 4 lists the relevant tone sandhi rules. There are two checked tones in Xiapu Min: high-falling T5 and low-falling T2. Tone sandhi occurs when two tones are juxtaposed. Low-falling checked T2 becomes mid-level unchecked T44 (Rule g), whereas high-falling checked T5 becomes high-level unchecked T55 (Rule h) when they are followed by another tone in compounds. Table 4. Sandhi rules in Xiapu Min. "X" = any of the seven lexical tones in Xiapu Min (The rules are based on Wen (2015) and modified based on fieldwork data collected by the authors).  Table 4 shows that, after tone sandhi, checked T2 and unchecked T23 and T44 become phonologically neutralized as T44. Checked T5 and unchecked T35 and T42 become phonologically neutralized as T55. However, it is unclear whether the neutralization is phonetically complete. Previous studies have found that tonal sandhi can either be phonetically incomplete (e.g., Mandarin T213-T35 neutralization: Kuang 2018; Mizo rising-low tone neutralization: Lalhminghlui and Sarmah 2018; Fuzhou Min T232-T44 neutralization: Li 2016) or complete (e.g., Taiwanese Min: Chien and Jongman 2019; Fuzhou Min T53-T44 neutralization: Li 2016). Given the possible large acoustic differences between checked and unchecked citation tones in Xiapu Min, we hypothesize that the checked tones, which become unchecked after sandhi, will retain some of their attributes of being checked as in citation forms. For example, following Rule (g) in Table 4, the citation checked T2 should be realized as unchecked T44 before another tone. However, if neutralization is incomplete, it is possible that the sandhi form retains some characteristics of being checked, e.g., it may have a shorter duration or be more glottalized than the unchecked T44 derived from other unchecked tones, such as T23. In summary, our research questions are: in Xiapu Min, (1) how do checked syllables differ from unchecked ones in terms of their F0 height and contour, phonation type, and duration; (2) how do checked syllables differ from unchecked syllables after tone sandhi? Are phonologically neutralized checked and unchecked syllables also phonetically completely neutralized in terms of their F0, phonation type and duration? To answer the first question, we will comprehensively measure the acoustic properties of the vowels in the checked and unchecked syllables in Xiapu Min and perform statistical analyses to determine whether they are systematically different from each other. To answer the second question, we will perform Linear Discriminant Analysis on phonologically neutralized sandhi tones using the aforementioned acoustic measures to determine which acoustic parameters can effectively differentiate those neutralized tones.

Stimuli
Ten native speakers of Xiapu Min (five women, five men) with an average age of 53.5 participated in the production experiment conducted in Xiapu, Fujian, China. The study has been approved by the Institutional Review Board of the University of California San Diego. All the participants signed a consent form and an audio recording release consent form before participating in the experiment.
The reading material for the production experiment consisted of two parts. In Part 1, we asked the participants to produce minimal pairs of citation tones. The stimuli of Part 1 had 90 target syllables in total. Every target syllable was embedded in a carrier phrase of /wa42 e11 kaN42 TARGET tCja42 ka44 tCi35/ ("I know how to say the segment TARGET") and was produced once by each participant.
In Part 2, we asked participants to produce compound words that contained citation tone minimal pairs that are neutralized after sandhi. For every pair under comparison, the target syllables had the same segments but different underlying tones. The tone of the adjacent syllable remained constant, so that the tonal environment of the target syllable was controlled. We controlled for whether the onset of the adjacent syllable was a sonorant or obstruent, so as to minimize the effects of the onset on the preceding vowel. Table 5 shows sample stimuli that display the contrasts of all phonological neutralized sandhi tones. The stimuli consisted of 21 minimal pairs, containing 41 compounds in total. Every target syllable was embedded in a carrier phrase of /wa42 e11 kaN42 TARGET tCja42 la44 θø11/ ("I know how to say the word TARGET") and was repeated twice. The citation forms of all the target syllables in Part 2 were covered in Part 1. The complete list of stimuli is presented in Supplementary Material S1. During the experiment, we elicited the compound stimuli in Part 2 first and the one-syllable stimuli in Part 1 next because we did not want to prime the participants with the underlying tone of the target syllables in the compound. All stimuli were presented in Chinese characters on a computer screen in random order using PsychoPy (Peirce et al. 2019). The participants were instructed to produce the sentences as naturally as possible. Their productions were recorded in a quiet room in Xiapu using a Shure SM10 headset microphone, amplified by a USB-powered Focusrite Scarlett 2i2 3rd Gen preamp and using a Dell laptop with a Realtek ALC236 soundcard.

Acoustic Measures in Use and Criteria for Detecting Tracking Errors and Outliers
We segmented the vowel for each target syllable. All target syllables are CV or CVP. In syllables with a stop or fricative onset, the vowel started at the first repetitive pulse after the release of the stop or the frication noise. In syllables with a sonorant onset, the vowel started when the amplitude increased significantly. When the following word of the target syllable had a stop onset, the target vowel ended when the voicing stopped or when the formant amplitude dropped significantly, whichever came first. When the following word had a fricative onset, the target vowel ended when the frication noise started. When the following word had a sonorant onset, the target vowel ended when the amplitude decreased significantly or when the formant started to change, whichever came first.
We then calculated the following acoustic parameters: F0, F1, F2, H1*-H2*, and Harmonic-to-Noise Ratio (HNR) between 0 and 500 Hz using VoiceSauce (Shue et al. 2011). VoiceSauce calculated a value for each measurement every millisecond. F0 correlates with the pitch of the tone and was calculated using the STRAIGHT algorithm in VoiceSauce. F1 and F2 represent the height and frontness of vowels, and were calculated using PRAAT (Boersma and Weenink 2021). The formant setting was to find 5 formants in the 0-5000 Hz range. H1*-H2* and HNR together represent the voice quality of the target vowels. H1*-H2* is the difference in amplitude between the first and second harmonics (corrected for formant frequencies and bandwidths to allow for cross-vowel comparisons). Compared to modal voice, lower H1*-H2* values are correlated with more laryngeal constriction. In contrast, compared to modal voice, higher H1*-H2* values are correlated with glottal spreading and breathiness (Klatt and Klatt 1990;Zhang 2016; see overview in Garellek 2019). HNR measures spectral noise, with lower values indicating more noise, as found for both glottalized and breathy voice qualities. HNR is lower in creaky voice, due to increased aperiodicity, and in breathy voice due to aspiration (Garellek 2019). We use HNR measured between 0 and 500 Hz because this particular noise measure is especially sensitive to aperiodicity, in addition to being sensitive to aspiration. Viewed together, H1*-H2* and HNR provide a means of distinguishing modal voice from breathy and glottalized voice (Garellek 2019). We predict that in Xiapu Min checked syllables, the glottal coda will trigger glottalization at the end of the vowel. Consequently, we predict that this creaky voice will be reflected by lower H1*-H2* and lower HNR, relative to a tone with modal voice (Garellek 2019;Seyfarth and Garellek 2018).
The tracking errors and outliers in the output by VoiceSauce were detected by visual inspection and statistical analysis. First, tokens whose energy value was either zero or failed to be calculated by VoiceSauce were excluded from the analyses of all acoustic measures. Next, we performed visual inspections of the F0 values in the output. Pitch tracking errors are more likely to occur when there is a non-modal voice. Thus, we manually checked the F0 output from VoiceSauce for every VP token. We drew an F0 track for every VP token and inspected whether there was pitch halving or doubling in the F0 track. When the pitch tracking failed, we excluded their F0 values from F0 analysis. These files are also excluded from H1*-H2* analysis because the correct estimation of H1*-H2* depends on the correct estimation of F0. The pitch track plots and the excluded tokens are presented in Figure S1 and Table S2 in Supplementary Material S3.
We also performed visual and statistical inspection of the formant outputs to exclude tracking errors from formant analysis. H1*-H2* was calculated based on vowel formant. Thus, tokens with formant tracking errors were also excluded from H1*-H2* analysis. Within each vowel category, we calculated the Mahalanobis distance (De Maesschalck et al. 2000) on the F1-F2 panel between every individual token to the mean of the category. The larger the Mahalanobis distance, the more deviant the vowel is from the center of the category and the more likely there is a tracking error for that vowel. We followed the criterion in Garellek and Esposito (2021) and Seyfarth and Garellek (2018), and regarded tokens with a Mahalanobis distance larger than 6 as an outlier and excluded them from the analysis of vowel quality and H1*-H2*. We also plotted the mean F1 and F2 of the mid-third portion of each vowel and excluded outliers detected by visual inspection. In addition, we manually checked the spectrogram of /u/s with an F2 greater than 1500 Hz. If the F2 tracking was wrong, the /u/ token was discarded from formant and H1*-H2* analysis. The vowel formant plots and the excluded tokens are presented in Figures S2 and S3 and Table S3 in Supplementary Material S3.
After excluding the tracking errors of F0, F1, and F2, we transformed the values of F0, H1*-H2*, and HNR into z-score to reduce between-speaker variation and increase the power of the statistical analyses. We calculated the log z-score of F0 by first log transforming the F0 in Hertz, then z-scoring it by speaker. We conducted a log transformation of F0 values first because the distribution of F0 was right skewed. The log transformation resulted in a normal distribution of the F0 and increased the validity of the statistical analyses (Keene 1995). Tokens with a z-score exceeding 3 were considered outliers (perhaps from tracking errors) and discarded from the analysis of that measure. Tokens with F0 outliers were excluded from H1*-H2* analysis. We used the log z-score of F0 and the z-score of HNR and duration in the statistical analyses in the following sections. We used the raw value, rather than the z-score, for H1*-H2*, F1, and F2. H1*-H2* went through more steps of outlier exclusion than other parameters, such that the data were no longer balanced by participant or tone. There were more tracking errors for the vowel /u/ than for other vowels. After excluding tracking errors for formants, F1 and F2 became unbalanced by participant and vowel. Using a z-score by participant for H1*-H2*, F1, and F2 could therefore distort the data and obscure the effects. The R code for detecting tracking errors and outliers and all the statistical analyses in the following sections are presented in Supplementary Material S4.

The Acoustic Features of Checked Tones in Citation Forms
Note that in Xiapu Min, checked syllables and checked tones are always associated with each other. Therefore, the results measured from the vowels in checked syllables are equivalent to the results for checked tones. For simplicity, we will refer to the results for checked syllables and checked tones together as the results for checked tones. We chose "checked tone" as the reference because there are two checked tones that need to be differentiated.
The recording of one participant's production of the citation forms was corrupted and thus discarded. One participant added an epenthetic vowel at the end of all the target syllables, so that their recording was discarded. Eight participants produced 720 tokens in total (90 segments * eight participants). A total of 91 tokens were excluded because of either corrupted recording or mispronunciations, leaving 629 tokens valid for analysis. VoiceSauce (Shue et al. 2011) yielded 137,226 data points in total. Based in part on feedback from the participants, we believe there are three reasons for the large number of mispronunciations. First, some of the tokens were uttered in Mandarin due to the influence of Chinese characters presented in the elicitation materials. Second, some target syllables seldom occur in isolation in the language, but co-occur with other syllables as a compound. The participants sometimes pronounced the target syllable with its sandhi tone in a compound instead of with a citation tone in isolation. Third, each target syllable was only elicited once for each participant, and the unfamiliarity with the elicitation material might have led to some pronunciation errors. The tracking error and outlier detection and exclusion procedures were the same as described in Section 3.2. After data exclusion, there were 133,246, 125,700, 125,700, 119,838, and 136,469 data points for F0, F1, F2, H1*-H2*, and HNR, respectively. In order to normalize for duration differences when analyzing F0, H1*-H2*, and HNR, the data points were divided into nine equally timed intervals, and the mean of each interval was calculated. The descriptive statistics of the dataset can be found in Tables S4-S7 in Supplementary Material S3. 4.1.1. F0 Figure 3 shows the average F0 value of each tone over nine equally timed intervals for female and male participants, respectively. Checked T2 and 5 are represented by dotted lines. For both female and male speakers, checked T5 has the highest F0 among all tones. Checked T2 has a similar onset as the rising T35. In general, the tones produced by female speakers have higher F0 values than those produced by male speakers. T44, 23, 2, 11 are in a more compressed F0 range for male speakers than female speakers.
Languages 2022, 7, x FOR PEER REVIEW 11 of 29 4.1.1. F0 Figure 3 shows the average F0 value of each tone over nine equally timed intervals for female and male participants, respectively. Checked T2 and 5 are represented by dotted lines. For both female and male speakers, checked T5 has the highest F0 among all tones. Checked T2 has a similar onset as the rising T35. In general, the tones produced by female speakers have higher F0 values than those produced by male speakers. T44, 23, 2, 11 are in a more compressed F0 range for male speakers than female speakers.  shows that the F0 values in Hertz have large variation between females and males. We need to transform the F0 values in Hertz to a less varied scale in order to reduce between-speaker variation and establish a more uniformed representation of the tonal values. Log z-score has been found to be the most effective measure for reducing betweenspeaker variation among other F0 normalization methods (Zhu 2004), and it has been used in several studies (Duan and Jia 2015;Hu et al. 2012;Jia and Li 2012). The calculation of the log z-score was described in Section 3.2. We thus use the log z-score of F0 to represent the relative pitch height and contour of the seven tones in Xiapu Min in Figure 4a.
Tonal values in Chinese languages are usually represented by Chao numerals. Shi et al. (2010) proposed a T-score to transform the log-transformed F0 value of tone into a 0-5 scale. The T-score is calculated using Formula (1). F0 represents the F0 value of the current time point. F0min and F0max are the minimum and maximum values of F0 among all time points. The correspondence between T-score and Chao numeral is presented in Table  6. Liu (2008) proposed that each category should overlap ±0.1 with the neighboring categories to allow flexibility. T-scores at the borderline can be assigned either the lower or the higher Chao numeral.
T-Score 0-1.1 0.9-2.1 1.9-3.1 2.9-4.1 3.9-5 Chao numeral 1 2 3 4 5 Several studies on Chinese languages have adopted this T-score to transform F0 values to Chao numerals (e.g., Shao 2012;Tang 2014;Su 2016). This study uses the same method to calculate the Chao numerals for Xiapu Min tones. We modified Formula (1) into Formula (2) for calculating the T-score. Rather than using the logged value of F0, we used the log z-score (LZ) of F0 because it can further reduce between-speaker variation. This modified T-score does not change the relative position of tones represented by the  Figure 3 shows that the F0 values in Hertz have large variation between females and males. We need to transform the F0 values in Hertz to a less varied scale in order to reduce between-speaker variation and establish a more uniformed representation of the tonal values. Log z-score has been found to be the most effective measure for reducing between-speaker variation among other F0 normalization methods (Zhu 2004), and it has been used in several studies (Duan and Jia 2015;Hu et al. 2012;Jia and Li 2012). The calculation of the log z-score was described in Section 3.2. We thus use the log z-score of F0 to represent the relative pitch height and contour of the seven tones in Xiapu Min in Figure 4a.
Languages 2022, 7, x FOR PEER REVIEW 12 of 29 log z-score in Figure 4a, but it converts the log z-score to a 0-5 scale, so that it is more convenient to assign Chao numerals to the tones. The T-scores of each tone over nine equally timed intervals are shown in Figure 4b. The T-score at Time Point 1 and 9 of each tone, and their corresponding Chao numeral based on Table 6, are listed in Table 7. Note that, based on the rules in Table 6, T11 should be referred to as 21. However, checked T2 also falls in the 21 range and has a higher onset and a steeper fall than T11. Thus, for the purpose of differentiating the low unchecked tone from the low-falling checked tone, we assign Chao numerals 11 to T11. Our results provide an acoustic basis for the numerical value of Xiapu Min tones. We found The T-score is calculated using Formula (1). F0 represents the F0 value of the current time point. F0 min and F0 max are the minimum and maximum values of F0 among all time points. The correspondence between T-score and Chao numeral is presented in Table 6. Liu (2008) proposed that each category should overlap ±0.1 with the neighboring categories to allow flexibility. T-scores at the borderline can be assigned either the lower or the higher Chao numeral.
T-Score 0-1.1 0.9-2.1 1.9-3.1 2.9-4.1 3.9-5 Chao numeral 1 2 3 4 5 Several studies on Chinese languages have adopted this T-score to transform F0 values to Chao numerals (e.g., Shao 2012;Tang 2014;Su 2016). This study uses the same method to calculate the Chao numerals for Xiapu Min tones. We modified Formula (1) into Formula (2) for calculating the T-score. Rather than using the logged value of F0, we used the log z-score (LZ) of F0 because it can further reduce between-speaker variation. This modified T-score does not change the relative position of tones represented by the log z-score in Figure 4a, but it converts the log z-score to a 0-5 scale, so that it is more convenient to assign Chao numerals to the tones. The T-scores of each tone over nine equally timed intervals are shown in Figure 4b.
The T-score at Time Point 1 and 9 of each tone, and their corresponding Chao numeral based on Table 6, are listed in Table 7. Note that, based on the rules in Table 6, T11 should be referred to as 21. However, checked T2 also falls in the 21 range and has a higher onset and a steeper fall than T11. Thus, for the purpose of differentiating the low unchecked tone from the low-falling checked tone, we assign Chao numerals 11 to T11. Our results provide an acoustic basis for the numerical value of Xiapu Min tones. We found that the reported mid-rising tone T35 does not rise as high as T5 and should be noted as 24. The reported mid-level T44 has a lower onset than T42 and should be noted as 33. The Chao numeral we assigned to the tone is closer to the acoustic nature of the pitch height and shape in production. We suggest, then, that future studies on Xiapu Min use the Chao numerals proposed in this study. Additionally, note that the syllables in this section were elicited in a carrier phrase. Future study should also elicit syllables in isolation and see whether the tonal value changes. For the sake of consistency, we will continue to use the original tone number throughout the rest of the study. Table 7. T-score and Chao numerals of the seven tones in Xiapu Min (for each tone, the value for the onset is on the left and the value for the offset is on the right).  T53  T21  T42  T33  T24  T13  T11 * The onset of T11 should be assigned a Chao numeral of 2, but it was assigned 1 to be differentiated from T2. • H1*-H2*; Figure 5 shows the raw H1*-H2* values of each tone over nine equally timed intervals after averaging across all tokens and all speakers. The checked T5 and T2 are represented with dotted lines. The graph shows that the two checked tones have a clear falling H1*-H2* contour as time proceeds, whereas the unchecked tones have a flatter H1*-H2* contour over time. The checked tones also end in a lower H1*-H2* value than the unchecked tones. We conducted a linear mixed-effects analysis to test whether checked T5 and T2 have a more negative slope and end in a significantly lower H1*-H2* value than the unchecked tones. The model was implemented with the lmer() function in the lme4 package in R (Bates et al. 2015) (same for all other linear mixed-effects models in this paper). The R code for the H1*-H2* model is in (3). Model (3) was run twice, once with T5 and once with T2 as the reference level of Tone. The alpha level was adjusted to 0.025 (0.05/2). lmer(H1*-H2*~Time + Tone + Time * Tone + (1|Participant)) (3) Languages 2022, 7, x FOR PEER REVIEW 13 of 29 H1*-H2* contour as time proceeds, whereas the unchecked tones have a flatter H1*-H2* contour over time. The checked tones also end in a lower H1*-H2* value than the unchecked tones. We conducted a linear mixed-effects analysis to test whether checked T5 and T2 have a more negative slope and end in a significantly lower H1*-H2* value than the unchecked tones. The model was implemented with the lmer() function in the lme4 package in R (Bates et al. 2015) (same for all other linear mixed-effects models in this paper). The R code for the H1*-H2* model is in (3). Model (3) was run twice, once with T5 and once with T2 as the reference level of Tone. The alpha level was adjusted to 0.025 (0.05/2).
lmer(H1*-H2*~Time + Tone + Time * Tone + (1|Participant)) The statistics of Model (3)    HNR; Figure 6 shows the raw HNR values of each tone over nine equally timed intervals after averaging across all tokens and all speakers. The HNR contour of checked T2 is below all other tones at every time point. We used linear mixed-effects models to test whether, on average, T2 and T5 have a significantly lower HNR value than unchecked tones (4). Model (4) was run twice, once with T5 and once with T2 as the reference level of Tone. The alpha level was adjusted to 0.025 (0.05/2). lmer(HNR (z-score)~Tone + (1|Participant)) The statistics of Model (4) are presented in Tables S10 and S11 in Supplementary Material 3. The results of Model (4) show that, on average, checked T2 has a significantly lower HNR value than every other tone. Checked T5 has an HNR value lower than T44, similar to 42, but higher than T2, 23, 35, and 11. However, Figure 6 shows that there is a sudden drop in the HNR contour of T5 between Time Point 3 and 4. The drop of HNR from Time Point 4 (P4) to 9 (P9) is larger for T5 than any other tone. Table 8 shows the HNR values at P4 and P9 for all seven tones. We fitted a smooth spline for each contour using the sm.spline() function in R package "pspline" (Ramsey and Ripley 2017) and calculated the first derivative of the contour at each time point. A positive derivative means the contour is rising. A negative derivative • HNR; Figure 6 shows the raw HNR values of each tone over nine equally timed intervals after averaged across all tokens and all speakers. The HNR contour of checked T2 is below all other tones at every time point except for Point 8. We used linear mixed-effects models to test whether, on average, T2 and T5 have a significantly lower HNR value than unchecked tones (4). Model (4) was run twice, once with T5 and once with T2 as the reference level of Tone. The alpha level was adjusted to 0.025 (0.05/2). lmer(HNR (z-score)~Tone + (1|Participant)) The statistics of Model (4) are presented in Tables S10 and S11 in Supplementary Material S3. The results of Model (4) show that, on average, checked T2 has a significantly lower HNR value than every other tone. Checked T5 has an HNR value lower than T44, similar to 42, but higher than T2, 23, 35, and 11. However, Figure 6 shows that there is a sudden drop in the HNR contour of T5 between Time Point 3 and 4. The drop of HNR from Time Point 4 (P4) to 9 (P9) is larger for T5 than any other tone. Table 8 shows the HNR values at P4 and P9 for all seven tones. We fitted a smooth spline for each contour using the sm.spline() function in R package "pspline" (Ramsey and Ripley 2017) and calculated the first derivative of the contour at each time point. A positive derivative means the contour is rising. A negative derivative means the contour is dropping. A large absolute value means the rising/dropping slope is steep. The fitted spline for the HNR contour of each tone is plotted in Figure S4 in Supplementary Material S3. The HNR value predicted by the fitted spline and the first derivative at each time point for each tone are presented in Table S12 in Supplementary Material S3. Table 8 summarizes the HNR value at P4 and P9 and the difference between P4 and P9 for each tone. The last row is the first derivative at P4 for each tone. We see that T5 has the largest HNR fall from P4 to P9, and its first derivative at P4 has the largest negative value. Combining the evidence from visual inspection, raw HNR value difference between P4 and P9, and the negative derivative of the HNR contour, we argue that T5 has the largest HNR drop in the last two-thirds of the vowel among the seven tones. This indicates the production of T5 targets at a noisy quality toward the end of the vowel.
Languages 2022, 7, x FOR PEER REVIEW 14 of 29 means the contour is dropping. A large absolute value means the rising/dropping slope is steep. The fitted spline for the HNR contour of each tone is plotted in Figure S4 in Supplementary Material 3. The HNR value predicted by the fitted spline and the first derivative at each time point for each tone are presented in Table S12 in Supplementary Material 3. Table 8 summarizes the HNR value at P4 and P9 and the difference between P4 and P9 for each tone. The last row is the first derivative at P4 for each tone. We see that T5 has the largest HNR fall from P4 to P9, and its first derivative at P4 has the largest negative value. Combining the evidence from visual inspection, raw HNR value difference between P4 and P9, and the negative derivative of the HNR contour, we argue that T5 has the largest HNR drop in the last two-thirds of the vowel among the seven tones. This indicates the production of T5 targets at a noisy quality toward the end of the vowel.  In summary, checked tones T2 and T5 have more glottal constriction than unchecked tones, as indicated by the lower H1*-H2* values. Checked T2 has a noisier quality than unchecked tones, whereas checked T5 becomes noisier abruptly in the last two-thirds of the vowel. Together, the glottal constriction and noisy quality indicate a more glottalized phonatory quality of checked tones compared to unchecked tones. The HNR values do not differentiate checked from unchecked tones as consistently as the H1*-H2* values. Thus, we hypothesize that the listeners are less likely to use aperiodic voicing as a salient cue in differentiating checked tones from unchecked ones in perception. Future studies can manipulate spectral tilt and pulse periodicity separately to test the perceptual saliency of glottal constriction and noise for checked tone identification in Xiapu Min.
In addition to measuring the acoustic parameters, we also inspected the spectrograms of tokens with checked tones to describe the phonatory quality of checked tones comprehensively. We find that the realization of the glottal stop in Xiapu Min checked tones varies from strong realization with sustained (voiceless) glottal closure, to moderate realization with voiced glottalization on vowels, to deletion of a glottal stop. Figure 7 show a spectrogram of tokens of different degree of glottal stop realizations. Out of 236 tokens attested, there are 223 tokens realized with only voiced glottalization or reduced amplitude, 4 with full glottal closure and 9 with deletion of the glottal stop. We propose that  In summary, checked tones T2 and T5 have more glottal constriction than unchecked tones, as indicated by the lower H1*-H2* values. Checked T2 has a noisier quality than unchecked tones, whereas checked T5 becomes noisier abruptly in the last two-thirds of the vowel. Together, the glottal constriction and noisy quality indicate a more glottalized phonatory quality of checked tones compared to unchecked tones. The HNR values do not differentiate checked from unchecked tones as consistently as the H1*-H2* values. Thus, we hypothesize that the listeners are less likely to use aperiodic voicing as a salient cue in differentiating checked tones from unchecked ones in perception. Future studies can manipulate spectral tilt and pulse periodicity separately to test the perceptual saliency of glottal constriction and noise for checked tone identification in Xiapu Min.
In addition to measuring the acoustic parameters, we also inspected the spectrograms of tokens with checked tones to describe the phonatory quality of checked tones compre-hensively. We find that the realization of the glottal stop in Xiapu Min checked tones varies from strong realization with sustained (voiceless) glottal closure, to moderate realization with voiced glottalization on vowels, to deletion of a glottal stop. Figure 7 shows a spectrogram of tokens of different degree of glottal stop realizations. Out of 236 tokens attested, there are 223 tokens realized with only voiced glottalization or reduced amplitude, 4 with full glottal closure, and 9 with deletion of the glottal stop. We propose that the predominance of glottalized phonation over full glottal closure in checked tones is due to the phrasal position of the target syllables. All the target syllables are elicited in phrase-medial positions. Glottal stops are more likely to be realized as fully voiced in phrase-medial position .  The underlying form of (1), (2), and (3) is /tsaʔ 2/ 杂 "mixed", /paʔ 5/ 百 "hundred" and /tuʔ 5/ 督 "supervise". Figure 8 shows the duration of each tone after averaging across all tokens and all speakers. We ran a linear regression model to compare the duration of checked tones with unchecked tones. The R code for the model is in (5). Model (5) was run twice, once with T5 and once with T2 as the reference level of Tone. The alpha level was adjusted to 0.025 (0.05/2). The statistics of Model (5) are presented in Tables S13 and S14 in Supplementary Material 3. The results show that both checked T5 and T2 have a significantly shorter duration than every unchecked tone. lm(Duration (z-score)~Tone) (5)  Figure 9 shows the distribution in the F1-F2 vowel formant space of the five monophthongs for checked (T5, 2) and unchecked tones (T44, 11, 23, 35, 42). We did not include diphthongs for comparison to avoid the influence of formant transition in diphthongs. For each token, we calculated the mean F1 and F2 of mid-third of the vowel to ensure that the vowel formant is at a stable stage. Figure 9 shows that the checked and unchecked vowel  (3) is /tsaP 2/ 杂 "mixed", /paP 5/ 百 "hundred" and /tuP 5/ 督 "supervise". Figure 8 shows the duration of each tone after averaged across all tokens and all speakers. We ran a linear regression model to compare the duration of checked tones with unchecked tones. The R code for the model is in (5). Model (5) was run twice, once with T5 and once with T2 as the reference level of Tone. The alpha level was adjusted to 0.025 (0.05/2). The statistics of Model (5) are presented in Tables S13 and S14 in Supplementary Material S3. The results show that both checked T5 and T2 have a significantly shorter duration than every unchecked tone. lm(Duration (z-score)~Tone)

Duration
Languages 2022, 7, x FOR PEER REVIEW 15 of 29 the predominance of glottalized phonation over full glottal closure in checked tones is due to the phrasal position of the target syllables. All the target syllables are elicited in phrasemedial positions. Glottal stops are more likely to be realized as fully voiced in phrasemedial position ).

Figure 7.
Spectrograms of glottal stops realized as glottalized vowel, full glottal stop and deletion. The underlying form of (1), (2), and (3) is /tsaʔ 2/ 杂 "mixed", /paʔ 5/ 百 "hundred" and /tuʔ 5/ 督 "supervise". Figure 8 shows the duration of each tone after averaging across all tokens and all speakers. We ran a linear regression model to compare the duration of checked tones with unchecked tones. The R code for the model is in (5). Model (5) was run twice, once with T5 and once with T2 as the reference level of Tone. The alpha level was adjusted to 0.025 (0.05/2). The statistics of Model (5) are presented in Tables S13 and S14 in Supplementary Material 3. The results show that both checked T5 and T2 have a significantly shorter duration than every unchecked tone. lm(Duration (z-score)~Tone) (5)  Figure 9 shows the distribution in the F1-F2 vowel formant space of the five monophthongs for checked (T5, 2) and unchecked tones (T44, 11, 23, 35, 42). We did not include diphthongs for comparison to avoid the influence of formant transition in diphthongs. For each token, we calculated the mean F1 and F2 of mid-third of the vowel to ensure that the vowel formant is at a stable stage. Figure 9 shows that the checked and unchecked vowel ellipses have large overlaps. To determine whether checked and unchecked vowels differ  Figure 9 shows the distribution in the F1-F2 vowel formant space of the five monophthongs for checked (T5, 2) and unchecked tones (T44, 11, 23, 35, 42). We did not include diphthongs for comparison to avoid the influence of formant transition in diphthongs. For each token, we calculated the mean F1 and F2 of mid-third of the vowel to ensure that the vowel formant is at a stable stage. Figure 9 shows that the checked and unchecked vowel ellipses have large overlaps. To determine whether checked and unchecked vowels differ in F1 and F2, we conducted linear mixed-effect analyses using models in (6) and (7). The statistics of Model (6) and (7) are presented in Tables S15 and S16 in Supplementary Material S3. The result shows that, for both F1 and F2, the checked tone does not differ significantly from the unchecked tone. This indicates that checked and unchecked tones do not differ in vowel quality in Xiapu Min. lmer(F1~Tone + Checkedness + (1|Participant)) (6) lmer(F2~Tone + Checkedness + (1|Participant)) (7)

Vowel Quality
Languages 2022, 7, x FOR PEER REVIEW 16 of 29 statistics of Model (6) and (7) are presented in Tables S15 and S16 in Supplementary Material 3. The result shows that, for both F1 and F2, the checked tone does not differ significantly from the unchecked tone. This indicates that checked and unchecked tones do not differ in vowel quality in Xiapu Min.
In summary, we confirm that T5 and 2 have distinct pitch values from unchecked tones, and we propose a modification to the tonal values of T44, 35 and 23 based on the results from eight speakers and careful F0 normalization. We find that the checked tones are produced with more glottal constrictions and aperiodicity, indicating that the vowels in checked syllables are glottalized. The glottalization becomes stronger when the production proceeds toward the end of the vowel. The checked tones are shorter than unchecked tones. Checked and unchecked tones are found to be different in three out of four dimensions attested: they are shorter, they end in a glottalization, and they have distinct F0 values compared to unchecked tones. No significant differences in vowel quality have been found between checked and unchecked tones.

The Acoustic Features of Checked Tones in Sandhi Forms
The syllables that underwent sandhi in the compound words were the target syllables for this section. Ten participants produced 820 target syllables in total (41 compounds * two repetitions * ten participants). A total of 66 syllables were excluded because of either corrupted recording or mispronunciation, leaving 754 syllables valid for analyses. Voic-eSauce (Shue et al. 2011) yielded 79,100 data points in total. The tracking error and outlier detection and exclusion procedures were the same as described in Section 3.2. After data exclusion, there were 78,960, 74,299, 74,299, 71,638, and 78,617 data points for F0, F1, F2, H1*-H2*, and HNR, respectively. The data points were divided into nine (for plotting the results) and three equal time intervals (for the linear discriminant analysis). The descriptive statistics of the dataset can be found in Tables S4-S7 in Supplementary Material 3. In summary, we confirm that T5 and 2 have distinct pitch values from unchecked tones, and we propose a modification to the tonal values of T44, 35, and 23 based on the results from eight speakers and careful F0 normalization. We find that the checked tones are produced with more glottal constrictions and aperiodicity, indicating that the vowels in checked syllables are glottalized. The glottalization becomes stronger when the production proceeds toward the end of the vowel. The checked tones are shorter than unchecked tones. Checked and unchecked tones are found to be different in three out of four dimensions attested: they are shorter, they end in a glottalization, and they have distinct F0 values compared to unchecked tones. No significant differences in vowel quality have been found between checked and unchecked tones.

The Acoustic Features of Checked Tones in Sandhi Forms
The syllables that underwent sandhi in the compound words were the target syllables for this section. Ten participants produced 820 target syllables in total (41 compounds * two repetitions * ten participants). A total of 66 syllables were excluded because of either corrupted recording or mispronunciation, leaving 754 syllables valid for analyses. VoiceSauce (Shue et al. 2011) yielded 79,100 data points in total. The tracking error and outlier detection and exclusion procedures were the same as described in Section 3.2. After data exclusion, there were 78,960, 74,299, 74,299, 71,638, and 78,617 data points for F0, F1, F2, H1*-H2*, and HNR, respectively. The data points were divided into nine (for plotting the results) and three equally timed intervals (for the linear discriminant analysis). The descriptive statistics of the dataset can be found in Tables S4-S7 in Supplementary Material S3.

Neutralization among T2, T44, and T23
The first sandhi rule of Xiapu Min is {T2, T23, T44} → T44 / ___ X (Table 4, Example g). It results in a neutralization between T2, T23, and T44. We conducted Linear Discriminant Analysis (LDA) (Izenman 2013) to investigate whether the neutralized tones can be categorized by the acoustic features before and after neutralization. LDA models use a categorical variable as the dependent variable, and they use multiple parameters that can potentially differentiate the categories in the dependent variable as the independent variables. By assigning different coefficients to different parameters, the model outputs at least one composite linear discriminant score for each token, and it uses that score to classify the categories. The number of linear discriminant scores equals the number of categories in the dependent variable minus 1. For example, when there are three categories to classify, the model outputs two linear discriminant scores, which are named first and second linear discriminant scores (LD1 and LD2). The purpose of using LDA models is to compare the classification results of the model with the true categories of the data and calculate the classification accuracy. If the classification accuracy is high, the parameters have effectively differentiated the categories in the input. The parameters that have a higher correlation with the linear discriminant scores are more effective for the classification. If the classification accuracy is at or below chance, the parameters have failed to differentiate the categories in the input. In this study, we used the percentage of the majority class as the chance level, because in random guessing, predicting all the tokens as the majority class results in the highest chance (Bosch and Paquette 2018). The results of the LDA models can help determine whether the neutralization among the three underlyingly different tones is complete or not. The LDA models were implemented by the lda() function from the MASS package in R (Venables and Ripley 2002).
The R code for the LDA models is in (8). The dependent variable is the citation tone of the target syllables. The independent variables are the average F0, H1*-H2*, HNR of three equally timed intervals of the vowels (F0_1, F0_2, F0_3, H1*-H2*_1, H1*-H2*_2, H1*-H2*_3, HNR_1, HNR_2, HNR_3) and the Duration of the vowel. We did not include vowel formants in the model because no difference in vowel formants was found in the citation forms of the target syllables. lda(Tone~F0_1 + F0_2 + F0_3 + H1*-H2*_1 + H1*-H2*_2 + H1*-H2*_3 + HNR_1 + HNR_2 + HNR_3 + Duration) We compared the three tones in citation forms (T2 vs. T44 vs. T23) in the same model. Since the acoustic differences among tones in sandhi forms are likely to be largely neutralized, comparing all three tones in sandhi forms in the same model could potentially obscure the fine-grained differences. Thus, we compared every two tones in sandhi forms (T23 vs. T44, T2 vs. T23, T2 vs. T44) in three separate models. The citation tones are distinguished by two LD scores. Each pair of sandhi tones is distinguished by one LD score. For every LDA model, we calculated its classification accuracy based on a leave-one-out cross-validation. Figure 10a shows the LD1 and LD2 distribution of T2, T44, and T23 in citation forms. The classification accuracy of the citation forms is 94.81%, which is significantly higher than the 38.96% chance level (p < 0.001). We applied the LDA models on each pair of contrasts between T2, T44, and T23 in sandhi forms to test the degree of neutralization between every two tones. Figure 10b shows the LD1 distribution of each tone in each contrast. The classification accuracies of T23 vs. T44, T2 vs. T23, and T2 vs. T44 in sandhi forms are: 59.46% (p = 0.31; chance = 54.05%), 68.69% (p < 0.001; chance = 51.52%), and 76.61% (p < 0.001, chance = 52.63%). The results indicate that the citation forms of T2, T44, and T23 are differentiated at near-ceiling accuracy. In sandhi forms, however, T23 and T44 are completely neutralized, whereas T2 and T23, and T2 and T44, can still be differentiated significantly above chance. Note that T23 and T44 are tested by only one minimal pair, whereas T2 vs. T23 and T2 vs. T44 are tested by three and five minimal pairs, respectively.
The results for T23 vs. T44 may not be as representative as for the other two pairs. Future studies should aim for more balanced stimuli. The results for T23 vs. T44 may not be as representative as for the other two pairs. Future studies should aim for more balanced stimuli. Figure 10. (a) is the first and second linear discriminant score (LD1 and LD2) distribution of T2 vs. T44 vs. T23 in citation forms. The ellipses represent 50% confidence intervals around the mean of each group. (b) is the LD1 distribution of T23 vs. T44, T2 vs. T23, and T2 vs. T44 in sandhi forms, respectively. LD1 and LD2 are from the models without cross-validation.
Next, we ask which acoustic parameters contribute most to the above-chance discriminations. We calculated the Pearson correlation between each acoustic parameter and the linear discriminant scores. For citation tones, LD1 explains 63.3% of the variance. The top three parameters that have the highest absolute correlation with LD1 are duration, mid HNR, and final F0. For the discrimination between T23 and T2, the top three parameters that have the highest absolute correlation with LD1 are duration, final F0, and initial H1*-H2*. For the discrimination between T44 and T2, the top three parameters that have the highest absolute correlation with LD1 are duration, and initial and final HNR. The statistics of Model (8) and the correlations between the parameters and the linear discriminant scores are presented in Tables S17-S21 in Supplementary Material 3. Figure 11 shows the values of F0, H1*-H2*, HNR, and the duration of T44, T23, and T2 in citation and sandhi forms. In terms of F0, the contours of the three tones are well dispersed in citation forms. In sandhi forms, all tones have a flat F0 contour. The F0 height of T44 is slightly lower than that of T23 and T2. In terms of H1*-H2*, checked T2 is produced with lower H1*-H2* than unchecked T44 and T23 in citation forms. T2 has a falling H1*-H2* contour. In sandhi forms, the H1*-H2* value of T2 increases and is between T44 and T23. The H1*-H2* contour of T2 is flat. In terms of HNR, the HNR of T2 is lower than T44, but similar to T23 in citation forms. In sandhi forms, the difference in HNR among those three tones remains, but it becomes much smaller. The HNR of T2 and T23 increases. We compared the H1*-H2* and HNR of T2 between citation and sandhi forms using mixed-effects models and confirmed that the increases in both parameters after sandhi are significant. The statistics are in Tables S22 and S23 in Supplementary Material 3. In summary, checked T2 has a constricted and noisy quality in citation forms. In sandhi forms, T2 becomes less constricted and less noisy, indicating a reduction of glottalization. The duration of T2 is shorter than that of T44 and T23 in both citation and sandhi forms. The duration of T2 is shorter in sandhi forms than in citation forms, possibly because a sandhi form is at the position of the initial syllable in a disyllabic compound word, whereas a citation form is a monosyllabic word itself. Next, we ask which acoustic parameters contribute most to the above-chance discriminations. We calculated the Pearson correlation between each acoustic parameter and the linear discriminant scores. For citation tones, LD1 explains 63.3% of the variance. The top three parameters that have the highest absolute correlation with LD1 are duration, mid HNR, and final F0. For the discrimination between T23 and T2, the top three parameters that have the highest absolute correlation with LD1 are duration, final F0, and initial H1*-H2*. For the discrimination between T44 and T2, the top three parameters that have the highest absolute correlation with LD1 are duration, and initial and final HNR. The statistics of Model (8) and the correlations between the parameters and the linear discriminant scores are presented in Tables S17-S21 in Supplementary Material S3. Figure 11 shows the values of F0, H1*-H2*, HNR, and the duration of T44, T23, and T2 in citation and sandhi forms. In terms of F0, the contours of the three tones are well dispersed in citation forms. In sandhi forms, all tones have a flat F0 contour. The F0 height of T44 is slightly lower than that of T23 and T2. In terms of H1*-H2*, checked T2 is produced with lower H1*-H2* than unchecked T44 and T23 in citation forms. T2 has a falling H1*-H2* contour. In sandhi forms, the H1*-H2* value of T2 increases and is between T44 and T23. The H1*-H2* contour of T2 is flat. In terms of HNR, the HNR of T2 is lower than T44, but similar to T23 in citation forms. In sandhi forms, the difference in HNR among those three tones remains, but it becomes much smaller. The HNR of T2 and T23 increases. We compared the H1*-H2* and HNR of T2 between citation and sandhi forms using mixed-effects models and confirmed that the increases in both parameters after sandhi are significant. The statistics are in Tables S22 and S23 in Supplementary Material S3. In summary, checked T2 has a constricted and noisy quality in citation forms. In sandhi forms, T2 becomes less constricted and less noisy, indicating a reduction of glottalization. The duration of T2 is shorter than that of T44 and T23 in both citation and sandhi forms. The duration of T2 is shorter in sandhi forms than in citation forms, possibly because a sandhi form is at the position of the initial syllable in a disyllabic compound word, whereas a citation form is a monosyllabic word itself. Languages 2022, 7, x FOR PEER REVIEW 19 of 29 Figure 11. Acoustic parameter values of T44, T23, and T2 in citation and sandhi forms.

Neutralization among T5, T42, and T35
The other sandhi rule of Xiapu Min is {T5, T42, T35} → T55 / ___ X (Table 4, Example h). It results in the neutralization of T5, T42, and T35. Similar to Section 4.2.1, we performed LDA in this section to determine whether the neutralization between those three tones was complete or not. The R code was the same as in Formula (8). For every LDA model, we calculated its classification accuracy based on a leave-one-out cross-validation.
We compared the three tones in citation forms (T5 vs. T42 vs. T35) in the same model. Figure 12a shows the LD1 and LD2 distribution of T5, T42, and T35 in citation forms. The classification accuracy of citation forms is 100%, which is significantly higher than the 45.57% chance level (p < 0.001). We applied the LDA models on every two contrasts of T5, T42 and T35 in their sandhi forms. Figure 12b shows the LD1 distribution of each tone in each contrast. The classification accuracies of T35 vs. T42, T5 vs. T35, and T5 vs. T42 in sandhi forms are: 47.41% (p = 0.83; chance = 51.11%), 80.35% (p < 0.001; chance = 51.45%) and 86.79% (p < 0.001, chance = 53.77%). The results indicate that, before sandhi, the citation forms of T5, T42, and T35 are differentiated at ceiling accuracy. After sandhi, T35 and T42 are completely neutralized along these measures, whereas T5 and T35, and T5 and T42, can still be differentiated significantly above chance.

Neutralization among T5, T42, and T35
The other sandhi rule of Xiapu Min is {T5, T42, T35} → T55 / ___ X (Table 4, Example h). It results in the neutralization of T5, T42, and T35. Similar to Section 4.2.1, we performed LDA in this section to determine whether the neutralization between those three tones was complete or not. The R code was the same as in Formula (8). For every LDA model, we calculated its classification accuracy based on a leave-one-out cross-validation.
We compared the three tones in citation forms (T5 vs. T42 vs. T35) in the same model. Figure 12a shows the LD1 and LD2 distribution of T5, T42, and T35 in citation forms. The classification accuracy of citation forms is 100%, which is significantly higher than the 45.57% chance level (p < 0.001). We applied the LDA models on every two contrasts of T5, T42, and T35 in their sandhi forms. Figure 12b shows the LD1 distribution of each tone in each contrast. The classification accuracies of T35 vs. T42, T5 vs. T35, and T5 vs. T42 in sandhi forms are: 47.41% (p = 0.83; chance = 51.11%), 80.35% (p < 0.001; chance = 51.45%), and 86.79% (p < 0.001, chance = 53.77%). The results indicate that, before sandhi, the citation forms of T5, T42, and T35 are differentiated at ceiling accuracy. After sandhi, T35 and T42 are completely neutralized along these measures, whereas T5 and T35, and T5 and T42, can still be differentiated significantly above chance. Languages 2022, 7, x FOR PEER REVIEW 20 of 29 Figure 12. (a) is the LD1-LD2 distribution of T5 vs. T42 vs. T35 in citation forms. The ellipses represent 50% confidence intervals around the mean of each group. (b) is the LD1 distribution of T35 vs. T42, T5 vs. T35, and T5 vs. T42 in sandhi forms, respectively. LD1 and LD2 are from the models without cross-validation.
We calculated the Pearson correlation between each acoustic parameter and the linear discriminant scores to determine which parameters contribute most to the abovechance discriminations. LD1 explains 91.9% of the variance of the citation tones. The top three parameters that have the highest absolute correlation with LD1 are initial and mid F0, and duration. In both discriminations between T5 and T35 and between T5 and T42 after sandhi, the top three parameters that have the highest absolute correlation with LD1 are duration, and initial and mid HNR. The statistics of Model (16) and the correlation between all the parameters and the linear discriminant scores are presented in Tables S24-S28 in Supplementary Material 3. Figure 13 shows values of F0, H1*-H2*, and HNR of T42, T35, and T5 in citation and sandhi forms. In terms of F0, the three tones have well-dispersed contours in citation forms. In sandhi forms, their F0 contours become flat and are largely overlapping. In terms of H1*-H2*, in citation forms, checked T5 overlaps with T42 and T35 in the first two-thirds of the vowel, and it has lower values than T42 and T35 in the last third. In sandhi forms, checked T5 has overall higher H1*-H2* than T42 and T35, and it ends in a similar value as T42 and T35. On average, the H1*-H2* value of checked T5 has increased after sandhi. In terms of HNR, in citation forms, T5 overlaps with T42 and is higher than T35 in the first two-thirds of the vowel, and it has lower values than T42 and T35 in the last third. In sandhi forms, T5 has lower HNR than T42 and T35 in general. However, on average, the HNR value of T5 has increased after sandhi. In addition, in citation forms, the HNR of T5 has an abrupt fall after Point 4. In sandhi forms, the HNR of T5 has an overall rising contour, and there is a slight fall after Point 8. The final HNR value of T5 is higher in sandhi than in citation forms. We compared the H1*-H2* and HNR of T5 between citation and sandhi forms using mixed-effects models and confirmed both parameters have significantly higher values in sandhi forms than in citation forms. The statistics are presented in Tables S29 and S30 in Supplementary Material 3. In summary, checked T5 has a constricted quality and a noisy ending in citation forms. In sandhi forms, T5 becomes less constricted and less noisy, indicating a reduction of glottalization. The duration of T5 is shorter than T42 and T35 in citation and sandhi forms. The duration of T5 is shorter in sandhi forms than in citation forms. We calculated the Pearson correlation between each acoustic parameter and the linear discriminant scores to determine which parameters contribute most to the above-chance discriminations. LD1 explains 91.9% of the variance of the citation tones. The top three parameters that have the highest absolute correlation with LD1 are initial and mid F0, and duration. In both discriminations between T5 and T35 and between T5 and T42 after sandhi, the top three parameters that have the highest absolute correlation with LD1 are duration, and initial and mid HNR. The statistics of Model (16) and the correlation between all the parameters and the linear discriminant scores are presented in Tables S24-S28 in Supplementary Material S3. Figure 13 shows values of F0, H1*-H2*, and HNR of T42, T35, and T5 in citation and sandhi forms. In terms of F0, the three tones have well-dispersed contours in citation forms. In sandhi forms, their F0 contours become flat and are largely overlapping. In terms of H1*-H2*, in citation forms, checked T5 overlaps with T42 and T35 in the first two-thirds of the vowel, and it has lower values than T42 and T35 in the last third. In sandhi forms, checked T5 has overall higher H1*-H2* than T42 and T35, and it ends in a similar value as T42 and T35. On average, the H1*-H2* value of checked T5 has increased after sandhi. In terms of HNR, in citation forms, T5 overlaps with T42 and is higher than T35 in the first two-thirds of the vowel, and it has lower values than T42 and T35 in the last third. In sandhi forms, T5 has lower HNR than T42 and T35 in general. However, on average, the HNR value of T5 has increased after sandhi. In addition, in citation forms, the HNR of T5 has an abrupt fall after Point 4. In sandhi forms, the HNR of T5 has an overall rising contour, and there is a slight fall after Point 7. The final HNR value of T5 is higher in sandhi than in citation forms. We compared the H1*-H2* and HNR of T5 between citation and sandhi forms using mixed-effects models and confirmed both parameters have significantly higher values in sandhi forms than in citation forms. The statistics are presented in Tables S29 and S30 in Supplementary Material S3. In summary, checked T5 has a constricted quality and a noisy ending in citation forms. In sandhi forms, T5 becomes less constricted and less noisy, indicating a reduction of glottalization. The duration of T5 is shorter than T42 and T35 in citation and sandhi forms. The duration of T5 is shorter in sandhi forms than in citation forms.  Table 9 summarizes the classification accuracy of each neutralized contrast and the top three acoustic parameters that have the highest correlation with the linear discriminant scores. Among the six neutralized pairs T23-T44, T2-T23, T2-T44, T35-T42, T5-T35,  T5-T42, four of them are not completely neutralized phonetically: T2-T23, T2-T44, T5-T35, and T5-T42. All four of those pairs involve a checked and an unchecked tone. The neutralizations between unchecked tones are all complete. According to the LDA results, duration is the primary acoustic correlate that distinguishes checked tones from unchecked tones in sandhi forms. Table 10 presents the average duration of each tone in citation and sandhi forms. Checked tones remain shorter than unchecked tones in sandhi forms, though the percentage of checked tone duration to unchecked tone duration increases slightly compared with the citation forms (citation: 66%; sandhi: 70%).

Summary of Tonal Neutralization in Sandhi Forms
Among all the sandhi forms, F0 (of the last third of vowels) surfaces as an important correlate only in distinguishing T2 from T23 (Table 9). However, the correlation between F0 and LD score is rather weak (Pearson r = −0.31). The absolute difference in final F0 between T23 and 2 is 8 Hz, which is rather small. Thus, we conclude that the F0 difference among checked and unchecked tones in citation forms is largely neutralized in sandhi forms.
Moreover, H1*-H2* (of the first third of vowels) surfaces as an important correlate only when distinguishing T2 from T23 in sandhi forms (Table 9). However, the correlation between H1*-H2* and the LD score is also rather weak (Pearson r = 0.25). In citation forms, the H1*-H2* contour of checked tones is in a falling trend (Figures 11 and 13). Checked T2 and 5 end in significantly lower H1*-H2* than unchecked tones. In sandhi forms, however, the H1*-H2* contour of T2 and 5 becomes flatter. The H1*-H2* of T2 is higher than  Table 9 summarizes the classification accuracy of each neutralized contrast and the top three acoustic parameters that have the highest correlation with the linear discriminant scores. Among the six neutralized pairs T23-T44, T2-T23, T2-T44, T35-T42, T5-T35, T5-T42,  four of them are not completely neutralized phonetically: T2-T23, T2-T44, T5-T35, and T5-T42. All four of those pairs involve a checked and an unchecked tone. The neutralizations between unchecked tones are all complete. According to the LDA results, duration is the primary acoustic correlate that distinguishes checked tones from unchecked tones in sandhi forms. Table 10 presents the average duration of each tone in citation and sandhi forms. Checked tones remain shorter than unchecked tones in sandhi forms, though the percentage of checked tone duration to unchecked tone duration increases slightly compared with the citation forms (citation: 66%; sandhi: 70%).

Summary of Tonal Neutralization in Sandhi Forms
Among all the sandhi forms, F0 (of the last third of vowels) surfaces as an important correlate only in distinguishing T2 from T23 (Table 9). However, the correlation between F0 and LD score is rather weak (Pearson r = −0.31). The absolute difference in final F0 between T23 and 2 is 8 Hz, which is rather small. Thus, we conclude that the F0 difference among checked and unchecked tones in citation forms is largely neutralized in sandhi forms.
Moreover, H1*-H2* (of the first third of vowels) surfaces as an important correlate only when distinguishing T2 from T23 in sandhi forms (Table 9). However, the correlation between H1*-H2* and the LD score is also rather weak (Pearson r = 0.25). In citation forms, the H1*-H2* contour of checked tones is in a falling trend ( Figures 5, 11 and 13). Checked T2 and 5 end in significantly lower H1*-H2* than unchecked tones. In sandhi forms, however, the H1*-H2* contour of T2 and 5 becomes flatter. The H1*-H2* of T2 is higher than T44. The H1*-H2* of T5 is higher than T35 and 42. Given that a higher H1*-H2* value is correlated with less glottal constriction, we argue that checked tones in sandhi forms become less constricted, especially at the end of vowels, compared with the citation forms. The difference in glottal constriction between checked and unchecked tones is largely neutralized in sandhi form.
Finally, HNR appears to be an effective parameter when distinguishing T2 from T44, T5 from T35, and T5 from T42 (Table 9). However, for T2 vs. T44, the difference in average HNR becomes smaller in sandhi forms than in citation forms (citation: 15.12 dB; sandhi: 4.88 dB). For T5 vs. T35 and T5 vs. T42, their difference in HNR is in the initial and middle third of the vowel, but not at the end. In contrast, in citation forms, checked T5 is characterized by its steeper falling HNR contour during the latter half of the vowel.
One possible explanation is that HNR differences between T5 and T35 and 42 in the initial two-thirds of the vowel are a by-product of the short duration and the influence of the onset in the checked tones. Three-quarters of the target syllables in the stimuli have a voiceless aspirated stop (/t h /), voiceless affricate (/ts/), or voiceless fricative (/x, θ/) as the onset. Thus, it is possible that the aspirated and fricated onsets introduce noise into the vowels. Additionally, it may also be because vowels bearing checked tones in sandhi forms are extra-short compared to those with unchecked tones (both in citation and sandhi) and to those with checked tones in citation forms. It is possible, then, that checked tones in sandhi forms might be more affected by the onset noise than other tokens because their vowel duration is too short to gain periodicity after the noisy onset. Considering the artifact brought on by the onset, and the fact that average H1*-H2* and HNR values of checked tones increase after sandhi, we suggest that the vowel-final glottalized quality of the checked tones is largely reduced in sandhi forms. In summary, in citation forms, checked tones are differentiated from unchecked tones by having distinct F0 contour, shorter duration, and glottalized quality at the end of the vowels. In sandhi forms, checked tones acquire similar F0 values and phonatory quality to unchecked tones. However, the duration difference between checked and unchecked tones persists.

Discussion and Conclusions
In this study, we have determined how checked tones in Xiapu Min differ from unchecked tones, in terms of their F0 height and contour, phonatory quality, duration, and vowel quality. These parameters are the ones most often associated with checkedness across languages. The results show that in citation forms, checked tones in Xiapu Min differ from unchecked tones in three out of four dimensions. We confirm that the two checked tones-T5 and T2-have distinct falling contours in comparison with the unchecked tones in Xiapu Min. They are also produced with more glottal constriction and noisier voice quality at the end of the vowel. Such evidence suggests that the vowels in checked syllables in Xiapu Min are glottalized in the end. Checked tones also have a shorter duration than unchecked tones. However, checked and unchecked tones do not differ in vowel quality. Thus, three out of four primary phonetic features of checked syllables and vowels that are found in other languages apply to Xiapu Min checked tones. We recommend that future studies on checked syllables and vowels in other languages focus on these four prototypical phonetic properties as well.
We further show how checked tones change when they are phonologically neutralized with unchecked tones. This study finds that incomplete neutralization only occurs between unchecked and checked tones. When neutralization occurs between two unchecked tones, it is complete, at least according to the measures investigated here. A possible explanation for the different degrees of neutralization is that the speakers' production of the sandhi forms is influenced by their knowledge of the citation forms. In sandhi forms, the acoustic parameter that most effectively differentiates checked tones from unchecked tones is duration; the F0 and voice quality differences between them in citation forms are largely neutralized.
Compared with Taiwanese Min checked tones in sandhi forms (Pan 2017), Xiapu Min behaves in a similar way. Taiwanese Min high and low checked tones [5] and [3] were both produced with more glottal opening than unchecked falling tones in sandhi forms. In Xiapu Min, the spectral tilt of checked tones increases significantly in sandhi forms compared with the citation forms, and it is mostly indistinguishable from that of unchecked tones. Checked tones' sandhi forms are not characterized by a glottalized quality in either Taiwanese or Xiapu Min.
With the acoustic results of checked syllables in citation and sandhi forms, we can also make inferences on the relation between glottalization and short duration in Xiapu Min. In citation forms, we find short duration and glottalization co-occur in CVP syllables. Cross-linguistically, vowels tend to be shorter in closed syllables than in open syllables (Farnetani and Kori 1986;Rietveld and Frauenfelder 1987;van Santen 1992;Maddieson 1985). In general, the realization of glottal stops and voiced glottalization, including in checked syllables, varies across languages . Glottal stops are frequently realized as voiced glottalization on adjacent vowels. The co-occurrence of short duration and glottalization in CVP syllables in Xiapu Min thus leads to the following question: are glottalization and short duration due to the coarticulation of closed syllable ending in glottal stop, or do they together comprise a distinct "checked" phonation as opposed to "unchecked" phonation? We argue that neither hypothesis captures the nature of checked syllables in Xiapu Min. First, we argue against the hypothesis that "checked" is a phonation type in Xiapu Min. Table 1 in Section 1 showed that CVP behaves differently from open syllables in phonological transformations, supporting the idea that the glottal stop in CVP syllables in Xiapu Min is a coda segment rather than a suprasegment. Second, we argue that the short duration of checked syllables is independent of their syllable structure. Our evidence comes from the acoustic properties of checked tones in sandhi forms. In sandhi forms, the voice quality of checked tones is similar to that of unchecked tones, indicating that the glottal gesture is reduced or even lost in the surface form. Despite the weakening or loss of glottal gestures, the short duration of checked tones is still preserved in sandhi forms. Based on the above evidence, then, we propose that "checked" is not a phonation type in Xiapu Min. The short duration of checked syllables is not due to the cross-linguistical nature of closed syllables, but it is instead an inherent nature of "being checked" in Xiapu Min.
Based on the acoustic properties of the checked tones in citation and sandhi forms, it is also possible to make predictions about possible sound change patterns of checked syllables in Xiapu Min. Across Chinese languages, there is a tendency for checked syllables to become unchecked (Cao 2002;Feng 2011;Gu 2015;Shen 2007;Song 2009;Xing and Meng 2006;Yang 1982). This phenomenon is referred as "rùshēng shūhuà" (入声舒化 "checked tone opening") in Chinese literature. However, it has been observed that the multiple features of checked syllables do not disappear at the same time but are lost in sequence. Zhu et al. (2008) listed three major paths leading to the loss of checkedness in Chinese languages. We schematize the different stages of each path proposed by Zhu et al. (2008) in Figure 14.
Based on the acoustic properties of the checked tones in citation and sandhi forms, it is also possible to make predictions about possible sound change patterns of checked syllables in Xiapu Min. Across Chinese languages, there is a tendency for checked syllables to become unchecked (Cao 2002;Feng 2011;Gu 2015;Shen 2007;Song 2009;Xing and Meng 2006;Yang 1982). This phenomenon is referred as "rùsheng shuhuà" (入声舒化 "checked tone opening") in Chinese literature. However, it has been observed that the multiple features of checked syllables do not disappear at the same time but are lost in sequence. Zhu et al. (2008) listed three major paths leading to the loss of checkedness in Chinese languages. We schematize the different stages of each path proposed by Zhu et al. (2008) in Figure 14. Figure 14. Three paths involving the loss of checkedness (schematized based on Zhu et al. (2008)).
The differences in the three paths depend on the sequencing of vowel lengthening, oral coda lenition, and glottal coda loss. Given how common it is for Chinese languages to lose their checked tones, one might assume that the checked syllables in Xiapu Min are also losing their checkedness. At what stage of the sound change might Xiapu Min currently be? Since Xiapu Min no longer has /p, t, k/ as coda in checked syllables, we can rule out Stage 1 and Stage 2 of Path I. The phonetic features of checked tones in citation forms suggest that Xiapu Min might currently be at Stage 2 of either Path II or Path III because its checked syllables are short and closed by a glottal stop. What might be the next stage of the possible sound change for checked syllables in Xiapu Min? We have observed that glottalization is weakened, whereas the short duration is retained in sandhi forms. This suggests that, in Xiapu Min, duration may be a more stable feature than glottalization for checked syllables. Thus, the next stage of Xiapu Min checked syllable sound change is more likely to be losing the glottal stop coda than vowel lengthening. Assuming that Xiapu Min checked syllables would go through sound changes in the future, their path is most likely to be Type II proposed by Zhu et al. (2008): the glottal stop is lost first, then syllable lengthening takes place.
To conclude: this study has provided the first quantitative acoustic analysis of Xiapu Min tones, revealing the phonetic features of Xiapu Min checked syllables in both citation and sandhi forms. The results provide inference to the diachronic change of Xiapu Min checked syllables and tones. In future work, we plan to conduct perception studies that manipulate F0, phonatory quality, and duration separately in sound signals for both citation and sandhi checked tones. Such studies will allow us to directly test whether short duration, glottalization and f0 are independent cues of checked tones in Xiapu Min. Given that duration appears to be a more stable feature than glottalization in Xiapu Min, we will test whether listeners are more sensitive to duration than glottalization when identifying The differences in the three paths depend on the sequencing of vowel lengthening, oral coda lenition, and glottal coda loss. Given how common it is for Chinese languages to lose their checked tones, one might assume that the checked syllables in Xiapu Min are also losing their checkedness. At what stage of the sound change might Xiapu Min currently be? Since Xiapu Min no longer has /p, t, k/ as coda in checked syllables, we can rule out Stage 1 and Stage 2 of Path I. The phonetic features of checked tones in citation forms suggest that Xiapu Min might currently be at Stage 2 of either Path II or Path III because its checked syllables are short and closed by a glottal stop. What might be the next stage of the possible sound change for checked syllables in Xiapu Min? We have observed that glottalization is weakened, whereas the short duration is retained in sandhi forms. This suggests that, in Xiapu Min, duration may be a more stable feature than glottalization for checked syllables. Thus, the next stage of Xiapu Min checked syllable sound change is more likely to be losing the glottal stop coda than vowel lengthening. Assuming that Xiapu Min checked syllables would go through sound changes in the future, their path is most likely to be Type II proposed by Zhu et al. (2008): the glottal stop is lost first, then syllable lengthening takes place.
To conclude: this study has provided the first quantitative acoustic analysis of Xiapu Min tones, revealing the phonetic features of Xiapu Min checked syllables in both citation and sandhi forms. The results provide inference to the diachronic change of Xiapu Min checked syllables and tones. In future work, we plan to conduct perception studies that manipulate F0, phonatory quality, and duration separately in sound signals for both citation and sandhi checked tones. Such studies will allow us to directly test whether short duration, glottalization, and F0 are independent cues of checked tones in Xiapu Min. Given that duration appears to be a more stable feature than glottalization in Xiapu Min, we will test whether listeners are more sensitive to duration than glottalization when identifying a tone as checked. Additionally, because the duration of checked syllables is significantly shorter than unchecked ones after sandhi neutralization, we will test whether listeners are able to discriminate between checked and unchecked syllables after sandhi based on duration cue.