Quantitative Acoustic versus Deep Learning Metrics of Lenition

: Spanish voiced stops /b, d, g / surfaced as fricatives [ β , ð, G ] in intervocalic position due to a phonological process known as spirantization or, more broadly, lenition. However, conditioned by various factors such as stress, place of articulation, ﬂanking vowel quality, and speaking rate, phonetic studies reveal a great deal of variation and gradience of these surface forms, ranging from fricative-like to approximant-like

Despite having been traditionally described as having a continuant and a non-continuant realization in complementary distribution, phonetic studies revealed a more varied and gradient distribution of the surface, lenited forms of Spanish voiced stops.For instance, continuant realizations, previously characterized as fricatives (i.e., produced

Introduction
Consonants are produced differently in different phonetic environments.For example, /b, d, g/, in most if not all Spanish dialects, are produced as voiced stops [b, d, g] after a pause, a homorganic nasal, and in the case of /d/, after a lateral /l/ (Navarro Tomás 1977;Hammond 2001;Hualde 2005), but as voiced fricatives (i.e., voiced continuants) [β, ð, G] in other contexts.These include intervocalic and postvocalic syllable-onset positions, both within and across word boundaries.For example, [bas] 'you go', [das] 'you give', and [gas] 'gas ', but [aβas] 'beans', [aðas] 'fairies ', and [aGas] 'you do (subjunctive)' (González 2002).The weakening of an underlying stop consonant to a voiced continuant demonstrated by these examples is commonly referred to as spirantization, one of the most widely studied phonological phenomena of Spanish.In turn, spirantization belongs to a broader class of a phonological process known as lenition, which also includes degemination, [tt → t]; deaspiration, [  (Gurevich 2011).
Despite having been traditionally described as having a continuant and a non-continuant realization in complementary distribution, phonetic studies revealed a more varied and gradient distribution of the surface, lenited forms of Spanish voiced stops.For instance, continuant realizations, previously characterized as fricatives (i.e., produced with turbulent airflow) (e.g., Navarro Tomás 1977;Harris 1969;Lozano 1978;Mascaró and Aronoff 1984), have been shown to be phonetically closer to approximants turbulent airflow) (e.g., Celdrán 1984;Romero 1995).Phonetic investigations also revealed large phonetic variability and gradience among continuant realizations conditioned by various factors, including surrounding vowel quality, stress, and speaking rate (e.g., Cole et al. 1999;Ortega-Llebaria 2004) suggesting a continuum rather than a fixed degree of constrictions across environments.Furthermore, the degree of constriction shows an effect on the place of articulation and surrounding vowel height (Ortega-Llebaria 2004;Carrasco et al. 2012;Simonet et al. 2012).Besides voiced stops, voiceless stops in Spanish also undergo lenition (e.g., Broś et al. 2021).The goal of this study is to compare lenition quantification metrics of lenited Argentinian Spanish stops consonants in intervocalic positions.

Acoustic Correlates of Lenition
The degree of lenition has been acoustically quantified along several acoustic dimensions, including intensity, duration, spectral (e.g., spectral peak, mean, standard deviation, and kurtosis), and periodic acoustic measures (e.g., harmonic-to-noise ratio), of which intensity, calculated as a difference or as a ratio, is the most prevalent (Cole et al. 1999;Ortega-Llebaria 2004;Soler and Romero 1999;Hualde et al. 2011).For example, Martínez- Celdrán and Regueira (2008), Figueroa and Evans (2015), and Broś et al. (2021) used intensity difference (preceding segment's maximum intensity minus minimum intensity of the target consonant) as a lenition marker.Similarly, Hualde et al. (2011) used the difference between the maximum intensity value during the vowel following the target consonant and the minimum value during the target consonant portion to quantify the degree of lenition.The more open the constriction of the target consonant (i.e., the more lenited the target consonant), the smaller the difference is expected to be.
The second intensity measure previously used to quantify the degree of lenition is the maximum rising velocity from the midpoint of the target consonant to the midpoint of the following vowel (Hualde et al. 2011(Hualde et al. , 2012;;Kingston 2008).The more lenited the consonant is, the less abrupt the transition in intensity is and, thus, the smaller the maximum rising velocity value.Lastly, the mean intensity of the target sounds could also indicate their degree of lenition: the higher the mean intensity, the more advanced their degree of lenition.However, since this measure could vary greatly with speaking volume, it may not be as reliable as relative intensity measures.
Besides relative intensity, the relative duration of the target consonant (target sound duration divided by the total duration of the preceding sound + target sound + following sound) correlates negatively with the degree of consonant weakening and has been used as a reliable lenition marker (e.g., Dalcher 2008).This measurement is usually used when the target consonant occurs intervocalically but can be adapted to other contexts.For example, an alternative relative duration ratio was calculated by dividing the target-sound duration by the total duration of the preceding sound + target sound in Broś et al. (2021) because the segment following the target sound was not always a vowel in their data.The more lenited the target consonant, the shorter its duration is expected to be, thus the smaller the duration ratio.
The harmonics-to-noise ratio (HNR) is another lenition marker first employed by Broś et al. (2021).HNR is a measure of the proportion of acoustic periodicity (harmonic) to aperiodicity (noise) of a given sound and is expressed in decibels (dB).A HNR of 0 dB indicates equal energy in harmonics (periodicity) and noise.A positive HNR value indicates higher harmonic energy relative to noise energy, while a negative HNR value indicates higher noise energy relative to harmonic energy.For example, a HNR of 3 dB means that the harmonic energy is twice the noise energy (10 × log 10 (2/1)), but a HNR of −3 dB HNR (10 × log 10 (1/2)) indicates the opposite.Broś et al. (2021) reasoned that the more lenited a segment, the more vowel-like it is, hence the higher the HNR.
An alternative method of measuring the degree of periodicity, called 'Noise', was proposed by Harris et al. (Forthcoming), which builds on Harris and Urua (2001).Noise is computed using a three-step algorithm that combines measurements of amplitude and aperiodicity within a VCX analysis frame.First, the algorithm computes the aperiodicity of the signal using an autocorrelation measure.Second, it locates the target consonant within the frame using the minimum amplitude.Third, it searches forward within the frame until it finds the point with the maximum product of aperiodicity and normalized amplitude values, which yields a Noise score.The higher the Noise score, the greater the degree of aperiodic energy in the signal.One of its advantages over the HNR measure is that Noise can investigate a wider set of lenition phenomena, such as the release properties of final stops, which does not exist in Spanish.
The goal of this study is to compare a new approach to quantify the degree of lenition known as 'Phonet' to the traditional acoustic markers such as those described above.Phonet is a deep learning model.Unlike the quantitative acoustic approach, where values along different acoustic dimensions are directly used to estimate lenition, in the Phonet model, the degree of lenition is estimated from the posterior probabilities of sonorant and continuant phonological features computed directly from the input signals by bidirectional recurrent neural networks (RNNs).Specifically, the approach projects gradient surface acoustic parameters onto two phonological features, continuant and sonorant, to capture both categorical and gradient realizations of lenition.Additionally, it is largely automatic and can be customized for a specific language investigated.The basis for this approach is outlined in the following sections.

Lenition and Phonological Features
Phonemes are classified into broad categories or classes based on their common phonological features (Jakobson et al. 1951;Chomsky and Halle 1968).The broadest class is [consonantal].In most, if not all, languages, phonemes are either [+consonantal] or [−consonantal].[+consonantal] sounds are produced with varying degrees of constriction of articulators in the vocal tract, while [−consonantal] phonemes are produced with no oral constriction.Stops, fricatives, affricates, nasals, and liquids belong to the first category of phonemes, while the second category is restricted to vowel and glide phonemes in most languages.[syllabic] is another main phonological feature in languages.[+syllabic] phonemes are the most sonorous segments of a language and are permitted to occupy the nucleus position of a syllable, whereas [−syllabic] phonemes are not.Vowels and syllabic consonants ( etc.) are [+syllabic] while consonants including glides are [−syllabic].The second major class is [sonorant] and its inverse [obstruent].[+sonorant] includes phonemes produced with little to no constriction in the oral cavity, hence: relatively free airflow and the ability to sustain resonance.Nasals, liquids, glides, and vowels are [+sonorant], while stops, fricatives, and affricates which are produced with complete or substantial airflow obstruction are [−sonorant] or [+obstruent].
The third phonological class relevant to our study is [continuant].This feature describes the sustenance of airflow through the oral cavity.[+continuant] phonemes are produced with continual airflow through an incomplete closure between articulators.For example, fricatives are classified as [+continuant] because they are articulated with only partial oral occlusion, and airflow is permitted to flow continuously during their articulation.Other [+continuant] phonemes are liquids, glides, and vowels.For nasals, they are classified by some as [−continuant] because of airflow blockage through the oral cavity during their production, but as [+continuant] because continuous airflow is allowed through the nasal cavity.In this study, nasals are specified as [−continuant].See Hayes (2008) for an introduction to other phonological features.
The distribution of phonological features allows us to group phonemes into natural classes (groups of phonemes that share one or a set of phonological features).Phonemes belonging to the same natural class pattern together when undergoing various phonological processes.For example, in English, /p, t, k/ are [−syllabic, −voice, −continuant, −sonorant, −delayed release] and form a natural class.They all become aspirated when they occur as an onset of a stressed syllable.[−delayed release] is characterized by an abrupt release of occluded airflow, while [+delayed release] is characterized by a grad-ual release of airflow following the opening of an oral closure.In Spanish, /b, d, g/ are [−syllabic, +voice, −continuant, −sonorant, −delayed release] and form a natural class.As discussed above, they undergo lenition and are realized as fricatives [+continuant] or approximants [+sonorant] in an intervocalic position, for example.In other words, the lenition of Spanish-voiced stops can be simplistically described as involving categorical changes of the [continuant] and the [sonorant] features.However, to capture the highly varied and gradient degree of lenition, it is necessary that we look beyond categorical manifestations of lenition changes and beyond the binary nature of phonological features.

Posterior Probability and Gradience
Computational approaches have been used in several studies of phonetic variations whose aim was to measure gradient variations.Many of these studies have relied on forced alignment systems to determine pronunciation variations (e.g., [dZ]-[z] and [p h ]-[f] variations in Hindi English code-mixed speech (Pandey et al. 2020), 'g'-dropping in English (Kendall et al. 2021;Yuan andLiberman 2011), 'th'-fronting, 't, d'-deletion, and'h'dropping in English (Bailey 2016).The forced alignment systems typically take word-level orthographic transcriptions as the input making reference to a pronunciation dictionary with phone-level transcription.Importantly, multiple pronunciations can be assigned to each word entry in the dictionary.For instance, to model 'th' fronting, two pronunciations, one with [θ] and one with [f], could be given to all word entries that may undergo 'th'fronting.Based on each word token's acoustic properties, a trained forced aligner can automatically determine which of the two pronunciations has the highest probability.However, since a forced alignment model contains an acoustic model for each phone type defined in the pronunciation dictionary, the degree of variation could not be determined beyond the granularity of the phone set (e.g., as either [θ] or [f]).
An innovative method to obtain a more gradient measure of variations (e.g., degree of 'th'-fronting) as opposed to simply coding a token as [θ] or [f]) was proposed by Yuan and Liberman (2009) in their investigation of the degree of /l/-darkness in American English.In this study, instead of relying on phone labels outputted by the forced alignment procedure, probability scores extracted during the forced alignment procedure were used as a measure of variation.The probability score is defined as the log probability (log probability density) of the aligned segment to be a particular phone.More specifically, all /l/ tokens from a corpus of American English were forced aligned twice: first by a model trained on light /l/s (word-initial) and second by a model trained on dark /l/s (word-final and word-final consonant clusters), and degree of /l/-darkness was indicated by the difference between the log probability scores from the dark /l/ alignment and the light /l/ alignment.The method was extended to examine the finer variation of both types of /l/s by Yuan and Liberman (2011).Their results demonstrated the categorical distinction between dark (in syllable coda) and light /l/ (in syllable onset) while also revealing that intervocalic dark /l/ is less dark than canonical syllable-coda dark /l/, and its degree of darkness depends on the stress of the flanking vowels.Intervocalic light /l/ is always light and is lighter than canonical syllable-onset /l/.This method was also applied to investigate gradient variation of /t/-/d/ affrication in English by Magloughlin (2018).In this case, the degree of affrication is the log probability scores from the /tS, d/ alignment and the /tô, dô/ alignment using acoustic models of /tS/ and /dZ/, and /t/ and /d/, respectively.
Besides acoustic models in a forced alignment system, the probability estimates from token classification can be obtained from other methods.For instance, in their investigation of the degree of r-lessness of postvocalic /r/ in English, McLarty et al. (2019) trained the Support Vector Machines (SVM) model to classify the canonical r-less tokens (oral vowels that are not preceding a liquid or a nasal) and the canonical r-full tokens (prevocalic /r/) using Mel-Frequency Cepstral Coefficients (MFCCs) as the acoustic representations.Once successfully trained (mean classification accuracy of 98.95%), the model was applied to ambiguous tokens (postvocalic /r/) to obtain a probability estimate of being r-less as opposed to r-full.A similar approach was used by Villarreal et al. (2020) in their examination of two English sociophonetic variables (non-prevocalic /r/ and word-medial intervocalic /t/).However, instead of SVM, the random forest classification method was used to automate coding categorical manifestations of the two variables.Note that the classification method used by most of these studies is trained on surface segments that are not necessary surface realizations of the segment undergoing variation of interest.It simply relies on acoustic similarities between these surface segments and the possible canonical realizations of a variation.For instance, in the case of 'th'-fronting, the model was trained to classify tokens that are either canonically [θ] or canonically [f], and these canonical tokens themselves are not subjected to 'th'-fronting.However, their acoustic characteristics would capture the range of possible surface realizations of 'th'-fronting.In the case of /l/-darkening, canonical light /l/s, and dark /l/s are used in the training phase, and the trained model is then applied to /l/s that exhibit variable degrees of darkening.
The viability of this approach to estimate the categorical manifestation of lenition is demonstrated by the results of Cohen Priva and Gleason (2020).In this study, a range of processes commonly recognized as lenition was modeled using a spoken corpus of American English.Specifically, three types of modeling methods that differ in the underlying representation of the surface segments were examined.The first method compared the surface forms of two segment types (e.g., [t] and [d] for the lenition process /t/→ [d]) regardless of whether their underlying form was the segment in question (e.g., the [t] and [d] tokens do not need to share the underlying form /t/).The second method compared only the surface forms of two segment types that share the same underlying form (e.g., /t/ is the underlying form for both [t] and [d]).The third method compared only segments that surfaced unchanged (e.g., the [t] tokens realized from /t/ and the [d] tokens from /d/).Of significance is the finding that all three modeling approaches yielded the same results, suggesting that the various acoustic manifestations of a given lenition process (/t/→ [d] in this case) can be captured by comparing relevant pairs of surface segments, regardless of their underlying form.
The Phonet approach targets a whole class of lenition.Therefore, unlike Cohen Priva and Gleason (2020), we must go beyond classifying pairs of segments that are relevant to a lenition process, but rather two groups of segments that are categorized by a binary phonological feature.Specifically, we focus on the probability of the phonological feature [continuant], which differentiates stops from non-stops (e.g., stops lenited as a fricative), and the phonological feature [sonorant], which differentiates stops and fricatives from non-stops and non-fricatives (e.g., stops lenited as an approximant) because they capture the two categorical realizations of stop lenition in Spanish.A high [continuant] probability but a low [sonorant] probability would indicate a fricative-like realization, while a high [continuant] probability and a high [sonorant] probability would suggest an approximantlike realization of lenition.Unlike Yuan andLiberman (2009, 2011), where the degree of phonetic variation was estimated from the difference between the log probability scores of the two forced-alignment models (dark /l/ and light /l/), the degree of lenition is reflected in the probability of each phonological feature estimated from acoustic properties of the input signals.

Phonet
First proposed by Vásquez-Correa et al. (2019), Phonet estimates posterior probabilities of phonological features using bi-directional recurrent neural networks (RNNs) with gatedrecurrent units (GRUs).Inputs to Phonet are feature sequences based on log energy distributed across triangular Mel filters computed from 25 ms windowed frames of each 0.5 s chunk of the input signal.These feature sequences are processed by two bidirectional GRU layers, so information from the past (backward) and future (forward) states of the sequence are modeled simultaneously.The output sequences of the second bidirectional GRU layer are then passed through a time-distributed, fully connected hidden dense layer, producing an output sequence of the same length as the input.Finally, a phonological class associated with the feature sequence from the input is produced by the connected time-distributed output layer with a softmax activation function.Phonet has been found to be highly accurate in detecting phonemes and phonological classes in Spanish (Vásquez-Correa et al. 2019) and modeling the speech impairments of patients diagnosed with Parkinson's disease (Vásquez-Correa et al. 2019).The architecture of Phonet is described in detail in Vásquez-Correa et al. (2019).
In our study, twenty-three phonological classes of Spanish were trained by a bank of twenty-three Phonet networks and 26 phonemes by one network using an Adam optimizer (Kingma and Ba 2014).Following Vásquez-Correa et al. (2019), to avoid the unbalance of the classes in the training process, a weighted categorical cross-entropy loss function, defined according to Equation (1), was used.
The weight factors w i for each class i = {1 . . .C} are defined based on the percentage of samples from the training set that belong to each class.To improve the generalization of the networks, dropout and batch normalization layers were considered.
The use of MFCC-based acoustic features was motivated by how they are known to provide a good overall representation of the acoustic signal, as they often provide a wider range of acoustic information than individual acoustic features (Davis and Mermelstein 1980;Huang et al. 2001).In addition, MFCCs have been successfully used as acoustic representations in previous studies of phonetic variations (Kendall et al. 2021;Yuan andLiberman 2009, 2011;McLarty et al. 2019).A comparison between our approach using MFCCs and the quantitative acoustic approach using various acoustic dimensions would allow us to consider alternative acoustic representations to improve our model.
In sum, Phonet is a phonologically motivated, language-specific, and largely automatic model.It is trained to recognize input phones as belonging to different groups, defined by their phonological features.Once trained, posterior probabilities for different phonological features of the target segments are computed by the model.It relies on a phonological concept in phonological analyses of lenition and can capture both categorical and gradient surface manifestations of lenition.Input to the model is log energy distributed across triangular Mel filters computed from 25 ms windowed frames of each 0.5 s chunk of the input of the target language phones, thus using acoustic information for a given phonological feature of the target language.Phonological feature sets can be customized for a given target language with different assumptions of their underlying specifications (Lahiri and Reetz 2002;Lahiri and Reetz 2010) and physical correlates (Jakobson et al. 1951;Chomsky and Halle 1968;Backley 2011).Lastly, it only requires a phonological feature set and a segmentally-aligned acoustic corpus which can be obtained using forced alignment (see Ennever et al. 2017 for an automated segmentation method).

Materials
The Argentinian Spanish Corpus containing crowd-sourced recordings from 44 (31 female, 13 male) native speakers of Argentinian Spanish built by Guevara-Rukoz et al. (2020) was used in this study.The male sub-corpus contains 2.4 h of recording with 16,914 words (3342 unique words), while the female sub-corpus contains 5.6 h of recording with 35,360 words (4107 unique words).For the study, word tokens with voiced and voiceless stops, /b, d, g, p, t, k/, occurring between two vowels with different degrees of openness, were selected.Table 1 specifies the number of word tokens and word types by conditions-voicing (voiced or voiceless), place of articulation (bilabial, dental, and velar), and preceding and following vowels (open, mid, and close).2013)'s grapheme-to-phoneme mapping in IPA, a phonemic pronunciation dictionary for the transcription of the corpus words was generated and used to train new acoustic models for the corpus and align the textgrids to the acoustic signals.A tri-phone acoustic model in which the left and the right contexts of the target phone are used to adjust its alignment during the alignment procedure was used.The phone set parameter was set to IPA, which enabled extra decision tree modeling based on the specified phone set.All parameters were kept as the default.The corpus was randomly split into a training subset (80%) and a test subset (20%) using the Python (Version 3.9) scikit-learn library (Pedregosa et al. 2011).Since the surface realizations of the targets /b, d, g/, but not the targets /p, t, k/ (Colantoni and Marinescu 2010), were expected to be ambiguous realizations of the two features of interest: [continuant] and [sonorant], they were not included (i.e., silenced out) during training to avoid model contamination by the ambiguous tokens.In total, twenty-three phonological classes, including syllabic, consonantal, sonorant, continuant, nasal, trill, flap, coronal, anterior, strident, lateral, dental, dorsal, diphthong, stress, voice, labial, round, close, open, front, back and pause were trained by twenty-three different Phonet models.Like Vásquez-Correa et al. ( 2019), one addition model was included to train the phonemes.However, in addition to the 18 phonemes from Vásquez-Correa et al. ( 2019), 8 additional phonemes, including stressed /"a, "e, "i, "o, "u/, /ñ/, /θ/ and /spn/ for speech-like noise were also included.As previously discussed, weakened realizations of Spanish /b, d, g/ are either a fricative or an approximant (e.g., Simonet et al. 2012); therefore, [sonorant] and [continuant] are our features of interest.Model training was performed on the NVIDIA GeForce RTX 3090 GPU.The model was highly accurate in showing unweighted average recall (UAR) ranges from 94-98% across the different phonological classes.The sonorant and continuant features' UARS were 97% and 96%, respectively, suggesting a good model fit for both features.The model was then applied to our target word tokens with intervocalic voiced and voiceless stops, /b, d, g, p, t, k/.The predictions were computed for 10 ms frames.The average of the middle frame(s) was used as the prediction for phone tokens containing multiple frames.Thus, a sonorant posterior probability and a continuant posterior probability were obtained for each target stop.

Acoustic Parameters: HNR, Duration and Intensity
In order to compare our model to the quantitative acoustic approach, five common acoustic parameters covering three broad acoustic dimensions of lenitions were selected for comparisons.Harmonic-to-noise ratio (HNR), relative duration, intensity difference (two types), and mean intensity were extracted from the target intervocalic voiced and voiceless stops, /b, d, g, p, t, k/.
HNR quantifies degree of harmonicity relative to noise in the sound.The higher the HNR, the more periodic or vowel-like the sound is.Therefore, the more lenited a target stop is, the higher the HNR value is expected.HNR was calculated as ten times the log 10 ratio between the energy of harmonicity and noise.The mean HNR of the target segments was computed in Praat and was defined as (2).In the algorithm, t 1 and t 2 represent the starting point, and the ending point of the token, respectively, and x(t) is the harmonicity (in dB) as a function of time.
1/(t 2 − t 1 ) Relative duration of each target stop was obtained by taking the duration of a target stop and divided it by the total duration of the preceding vowel + target consonant + following vowel.The duration of the segmental tokens was generated during the forced alignment (Section 2.2).The more lenited the consonant is, the shorter the relative duration.
Two intensity difference values were calculated for each target stop by subtracting minimum intensity of the target segment from the maximum intensity of (a) the preceding vowel and (b) the following vowel.The assumption is that the smaller the intensity difference between the sound in question and the flanking vowels, the less constricted and hence the more lenited it is.The maximum intensity values of the preceding and following vowels and the minimum intensity value of the target segment were calculated using the parabolic interpolation method in Praat.
Finally, mean intensity captures average intensity of the target stops.The more lenited they are, the higher their mean intensity.The mean intensity values of the target segment were calculated in Praat with a definition as (3), where x(t) is the intensity (in dB) as a function of time.

Analyses
Values of the five acoustic parameters described above and the sonorant and continuant posterior probabilities generated by the Phonet model served as dependent variables in the linear mixed-effects regression models.The models' fixed variables were stress (stressed or unstressed), voicing (voiced or voiceless), place of articulation (bilabial, dental, and velar), preceding vowel height/openness (open, mid, and close), following vowel height (open, mid, and close), speaking rate [number of syllables in a word/word duration (in seconds)], and word status (content or function).Speaking rate and word status were included as they are known to influence lenition.Crucially, a higher degree of lenition is expected for a faster speaking rate relative to a slower speaking rate and for function words compared to content words (Broś et al. 2021;Soler and Romero 1999;Honeybone 2012).Similarly, a strong effect of stress on lenition has been reported, with a higher degree of lenition expected in unstressed syllables than in stressed syllables (Ortega-Llebaria 2004;Broś et al. 2021;Eddington 2011).On the contrary, the influence of place of articulation and flanking vowel openness has been inconsistent (Cole et al. 1999;Ortega-Llebaria 2004;Kingston 2008;Lewis 2001;Lavoie 2001).Overall, velar stops are expected to be weaker than labial and dental/alveolar stops, and the more open the flanking vowels, the greater the degree of lenition is expected.Regarding the effect of voicing, voiced stops are expected to be more lenited than voiceless stops (Broś et al. 2021;Colantoni and Marinescu 2010).
Deviation coding was used for the categorical variables stress, voicing, and word status, while forward difference coding was used for the variable's place of articulation (bilabial > dental > velar), preceding vowel (close > mid > open), and following vowel (close > mid > open).The models were performed using the lmer function from the lme4 package (Bates et al. 2015) in R Core Team (2022).After comparing multiple model structures with maximum likelihood, the best-fit model structure for each variable was identified.
Seven regression models were fitted with each of the five acoustic parameters and the two deep-learning-based features (the sonorant and the continuant phonological features) as the dependent variable.All models included different interaction terms but same random intercepts by speaker and word.The general formula of the model with three interaction terms is provided as follows: DEPENDENT VARIABLES ~Stress + Voicing + Place of articulation + Preceding vowel + Following vowel + Speaking rate + Word status + Place of articulation: Preceding vowel + Place of articulation: Following vowel + Preceding vowel: Post hoc comparisons of the interaction terms were carried out using emmeans (with Tukey HSD for p-value adjustment) (Lenth et al. 2021).Results of the best-fit model for each dependent variable are reported in the next section.

Mean HNR
The best regression model for HNR yielded marginal R 2 and conditional R 2 values of 0.575 and 0.657, respectively, suggesting that the fixed factors in the model explained 57.5% of the total variance while 65.7% of the variance is explained when all factors are included.
The results reveal the significant role of voicing, place of articulation, flanking vowel height, and speaking rate.As shown in Table 2, the model yielded significant main effects of voicing: higher mean HNR for voiced stops than voiceless stops [β = 10.433,t = 54.976,p < 0.001]; place of articulation: bilabial < dental [β = −1.232,t = −3.129,p = 0.002], no significant difference between dental and velar; preceding vowel: close > mid > open [βs = 1.164, 1.004; ts = 3.347, 7.822; ps = 0.001, <0.001]; following vowel: close > mid > open [βs = 1.430, 0.595; ts = 3.312, 1.989; ps = 0.001, 0.047]; and speaking rate: the higher the speaking rate, the higher the mean HNR [β = 0.124, t = 5.063, p < 0.001].These results suggest that, overall, voiced stops are more lenited than voiceless stops [e.g., /b/ in de bo"leto vs. /p/ in lo po"dés], dental stops are more lenited than bilabial stops [e.g., /d/ in de"datos vs. /b/ in mi "base], the less open the flanking vowels are, the more lenited the stops are (e.g., /b/ in tu "vida vs. /b/ in habla "varios], and the faster the speaking rate, the higher the degree of lenition based on HNRs. Significant interactions are also found between the place of articulation and the following vowel; between the place of articulation and the preceding vowel; and between the preceding vowel and the following vowel.For the place of articulation x preceding vowel interaction (see Figure 1a), post hoc pair-wise comparisons using the Tukey method indicate no significant difference in HNR values of velar stops when preceded by different vowels.However, for bilabial and dental stops, their mean HNR values are significantly higher when they occur after close vowels than after open vowels [βs = 2.330, 3.340; ts = 3.257, 5.338; p = 0.031, <0.001, for bilabial and dental stops, respectively].In addition, mean HNR values for bilabial stops are significantly lower than those of dental stops when they are preceded by mid vowels [β = −1.014,t = −4.339,p = 0.0006].Interestingly, HNRs for both bilabial and dental stops are significantly lower than those of velar stops when preceded by open vowels [βs = −1.806,−0.970; ts = −6.535,−3.302; ps < 0.001, =0.027 for bilabial and dental stops, respectively].These results indicate that HNRs are affected by different vowels for different stops: for bilabial and dental stops, HNRs increase after close vowels, while those for velar stops increase after low vowels.For the place of articulation × following vowel interaction (see Figure 1b), post hoc analyses suggest that mean HNR values for bilabial stops are significantly higher when they are followed by close than by open vowels [β = 1.838, t = 3.969, p = 0.003].For velar stops, HNR values are significantly higher when they are followed by close than by both mid and open vowels [βs = 2.677, 3.878; ts = 4.828, 6.935; ps < 0.001] for mid and open vowels, respectively].On the other hand, no significant difference across different vowel types was found for dental stops.When HNR values of the three places of articulation were compared for each vowel context, it was found that HNR values for bilabial stops were significantly lower than those for velar stops [β = −2.258,t = −4.156,p = 0.001] when they are followed by close vowels.When followed by open vowels, HNRs for bilabial and velar stops are significantly lower than those of dental stops [βs = −2.223,2.005; ts = −4.196,4.062; ps = 0.001, 0.002 for bilabial and velar stops, respectively].These results suggest that the following close vowels have a boosting effect on HNR values of all stops while the following open vowels depress HNRs, particularly for bilabial and velar stops.For the preceding vowel × following vowel interaction (see Figure 1c), results of the post hoc analyses suggested that when the following vowels are close, mean HNR values were significantly higher when the preceding vowels are also close or mid than when they are open [βs = 4.342, 1.686; ts = 4.711, 6.120; p = 0.0001, <0.001 for preceding close and mid vowels, respectively].Similarly, when the following vowels are mid, HNRs were significantly higher when the preceding vowels are also mid than when they are open [β = 0.568, t = 3.969, p = 0.002].When the following vowels are open, HNRs were significantly higher in the context of the preceding mid vowels than the preceding low vowels [β = 0.758, t = 3.208, p = 0.037].Similarly, when the preceding vowels are close, HNRs were significantly higher when the following vowels are also close than when they are open [β = 3.527, t = 3.180, p = 0.040].When the preceding vowels are mid, HNRs were significantly higher when the following vowels are close than when they are mid or open [βs = 0.907, 1.738; ts = 3.839, 6.300; p = 0.004, <0.001 for mid and open vowels, respectively].When the preceding vowels are open, HNRs were higher when the following vowels are mid than when they are open [β = 1.021, t = 3.943, p = 0.003].Overall, these results suggested that HNRs increase when followed by relatively less open vowels than by relatively more open vowels.In addition, except for following open vowels, this boosting effect was more likely to occur when the flanking vowels are of the same or different in degree of openness by no more than one step from each other.

Relative Duration
The best regression model for the relative duration (duration of the target sound/sum of the duration of the preceding sound, the target sound, and the following sound) showed that only 8.7% of the total variance was explained by the fixed factors (marginal R 2 = 0.087), but approximately 21% of the variance was additionally explained by the full mode (Conditional R 2 = 0.301).
Like HNR, the results reveal the significant role of voicing, place of articulation, flanking vowel height, and speaking rate.As shown in Table 3,  ; and speaking rate: the higher the speaking rate, the lower the duration ratio [β = −0.004,t = 7.381, p < 0.001].These results indicated that voiced stops are more lenited (shorter relative duration) than voiceless stops; dental stops are more lenited than bilabial and velar stops; a higher degree of lenition occurs when the preceding vowels are close relative to mid and when the following vowels are close or open relative to mid.In addition, lenition degree positively correlates with speaking rate.Significant interactions between the place of articulation and the following vowel, between the place of articulation and the preceding vowel, and between the preceding vowel and the following vowel were also found.For the place of articulation × preceding vowel interaction (see Figure 2a), relative durations of dental stops were found to be significantly shorter (more lenited) when preceded by close than by mid and open vowels [βs = −0.058,0.049; ts = −4.342,−3.678; ps = 0.0005, 0.007, respectively).Similarly, relative durations of velar stops were significantly shorter when the preceding vowels were close than For the place of articulation × following vowel interaction (see Figure 2b), post hoc pair-wise comparison indicated that relative durations of dental stops were significantly shorter when the following vowels are close vowels than when they are mid vowels [β = −0.058,t = −4.870,p < 0.0001), suggesting that they are more lenited in the following close vowel than in the following mid vowel context.No effects of vowel openness for bilabial and velar stops were indicated.When the three places of articulation were compared for each vowel context, the results suggested that relative durations of the dental stops were significantly shorter than those of bilabial stops in the close and open vowel contexts, suggesting that dental stops are, more lenited than bilabial stops when they are followed by these vowels [βs = −0.055,0.050; ts = −4.678,−3.814, ps = 0.0001, 0.005 for close and open following vowel context, respectively].The differences between dental and velar stops did not reach significance in any of the following vowel contexts.
For the preceding × following vowel interaction (see Figure 2c), post hoc analyses indicated that when followed by close vowels, relative durations are significantly shorter when the preceding vowels are close than when

Intensity Difference Relative to the Preceding Vowel
The more lenited the consonants are, the lower the intensity difference between the target stops and the preceding vowel is expected.For this dependent variable, the best regression model yielded marginal R 2 and conditional R 2 values of 0.744 and 0.806, suggesting that 74.4% of the total variance of intensity difference (relative to the preceding vowel) was accounted for by the model's fixed factors and that 80.6% of the total variance was explained by the full model.Unlike HNR and relative duration, stress along with voicing, place of articulation, surrounding vowel height, and speaking rate emerged as significant predictors of lenition degree for this variable.As shown in Table 4  These results suggested that stops weakened to a significantly higher degree when they are voiced than when they are voiceless, and when the syllables are unstressed than when they are stressed (e.g., /b/ in que "bueno vs. /b/ in para bus"car).In addition, velar stops are lenited to a greater degree than dental stops, and, in turn, dental stops are more lenited than bilabial stops.Additionally, both preceding and following vowels affected the degree of lenition.Specifically, lenition appears to be negatively correlated with the degree of openness of the preceding vowels: the less open the preceding vowels, the more weakened the stops are.However, the following vowel environments exert the opposite pattern of effects on stop lenition, and the difference reached significance only between the mid and the open vowels.More specifically, unlike preceding vowel contexts, the lenition degree is stronger when the following vowels are open than mid.Finally, as expected, the lenition degree increases when the speaking rate increases.
Significant interactions between the place of articulation and the preceding vowel (see Figure 3a), between the place of articulation and the following vowel (see Figure 3b), and between the preceding vowel and the following vowel (see Figure 3c) were also found.
For the place of articulation × preceding vowel interaction, smaller intensity differences are observed in the preceding close vowel context for stops at all three places of articulation.However, the effects are significant for dental and velar stops only.Specifically, post hoc pair-wise comparisons suggested that intensity difference was significantly smaller for dental consonants in the close vowel context relative to the open vowel context [β = −3.612,t = −3.912,p = 0.003], and for velar consonants in the close vowel than in both the mid and open vowel contexts [βs = −3.294,−5.285; ts = −3.630,−5.900; ps = 0.009, <0.0001, respectively].For each preceding vowel context, the results revealed that velar consonants are more lenited than bilabial consonants in all three vowel contexts [βs = −5.323,−2.402, −1.786; ts = −3.440,−5.894, −3.958; ps = 0.017, <0.001, = 0.003 for the close, mid, and open vowel contexts, respectively) and dental consonants in the mid vowel context [β = −1.509,t = −3.407,p = 0.020].Together, these results suggested that velar stops are more lenited than dental and bilabial stops and that preceding vowels with a lesser degree of openness induce a higher degree of weakening.
A different pattern emerges for the effects of the following vowels: relatively more open rather than more close vowels appear to trigger a higher degree of weakening, particularly for the velar stops.Specifically, post hoc pair-wise comparisons suggested that velar stops are more lenited than bilabial stops when followed by mid and open vowels [βs = −4.240,−2.798; ts = −6.012,3.710; ps = 0.0001, 0.007, respectively].However, no effects on the degree of openness of the following vowel within each place of articulation.
For the preceding × following vowel interaction, overall, regardless of the following vowels' height, a higher degree of weakening is observed when the preceding vowels are close.However, the effects of preceding vowel height reached significance only when the following vowels were also close.Specifically, post hoc pair-wise comparisons indicated that when the following vowel is close, the degree of lenition is highest when both the preceding vowels are also close relative to when they are mid or open [βs = −5.130,−6.868; ts = −3.814,−5.034; p = 0.004, <0.001].No significant difference across the preceding vowels' height when the following vowels are either mid or open.These results suggest that preceding close + following close vowel context triggers the highest degree of stop weakening.In addition, when the preceding vowels are mid or low, the degree of weakening becomes greater (smaller intensity difference) as the following vowels change from close to mid and to open.For the preceding mid vowels, the difference reaches significance between the following close and the following mid vowels [β = 1.300, t = 3.345, p = 0.024] but not between the following mid and the following open vowels.For the preceding open vowels, the difference reaches significance between the following mid and the following open vowel context [β = 1.979, t = 4.644, p = 0.0001].Together, these results suggest that the optimal lenition triggering environment is when both flanking vowels are close and that preceding mid and open vowels progressively increase the degree of lenition as the height of the following vowels increases.
Significant interactions between the place of articulation and the preceding vowel (see Figure 3a), between the place of articulation and the following vowel (see Figure 3b), and between the preceding vowel and the following vowel (see Figure 3c) were also found.

Intensity Difference Relative to the Following Vowel
Similar to the intensity difference between the target stops and the preceding vowels, it is assumed that smaller intensity differences between the target stop and the following vowels would indicate a stronger degree of lenition.The best regression model for this dependent variable indicated that the model's fixed factors accounted for 72.9% of the total variance and that the full model explained 80.3% of the variance (marginal R 2 = 0.729, Conditional R 2 = 0.803).Like intensity difference relative to the preceding vowel, stress along with voicing, place of articulation, and speaking rate play a significant role in the lenition degree based on this dependent variable.However, unlike the intensity difference relative to the preceding vowel, the effect of the preceding vowel height is non-significant.Specifically, the model yielded significant main effects of stress: higher intensity difference for stressed syllables relative to unstressed syllables [β = −3.741,t = −11.423,p < 0.001]; voicing: higher intensity difference for voiceless stops than voiced stops [β = −23.604,t = −65.454,p < 0.001]; place of articulation: bilabial > dental > velar [βs = 1.838, 1.416; ts = 2.837, 2.594; ps = 0.005, 0.010]; following vowel: close < mid [β = −1.563,t = −2.231,p = 0.026]; and speaking rate: the higher the speaking rate, the lower the intensity difference [β = −0.918,t = −23.732,p < 0.001] (see Table 5).These results suggested that stops are more lenited in unstressed syllables than in stressed syllables, and when they are voiced than when they are voiceless.In addition, velar stops are more lenited than bilabial and dental stops.Furthermore, the degree of stop lenition increases when the following vowels are close and when the speaking rate increases.A significant interaction between the preceding vowel and the following vowel (see Figure 4) was also obtained.Post hoc analyses indicate that the significant interaction stemmed from the fact that intensity differences were smaller (higher degree of lenition) when the preceding and the following vowels are close compared to when the following vowels are close, but the preceding vowels are mid [β = −5.816,t = −3.148,p = 0.043].

Mean Intensity
For the mean intensity of the target stops (in dB), the greater the mean intensity, the higher the degree of lenition.According to the best regression model, 65.9% (marginal R 2 = 0.659) of the total variance was explained by the fixed factors, and an additional 11.4% of the variance was accounted for by the random factors (condition R 2 = 0.773).Like relative intensity difference to the following vowel measure, stress, voicing, following vowel height, and speaking rate are strong predictors of lenition degree for this variable, while the role of place of articulation and preceding vowel height is minimal.Specifically, the model revealed significant main effects of stress: higher mean intensity for unstressed syllables [β = 1.149, t = 5.449, p < 0.001]; voicing: higher mean intensity for voiced stops [β = 15.063,t = 65.316,p < 0.001]; following vowel: closed < mid < open [βs = −1.587,−1.277, ts = −6.271,−5.162, ps < 0.001]; and speaking rate: the higher the speaking rate, the higher the mean intensity [β = 0.605, t = 22.507, p < 0.001] (see Table 6).These results indicated that intervocalic stops are weakened to a significantly higher degree in unstressed syllables relative to the stressed syllables; when they are voiced than when they are voiceless; when the following vowels are relatively more open, and when the speaking rate increases.

Mean Intensity
For the mean intensity of the target stops (in dB), the greater the mean intensity, the higher the degree of lenition.According to the best regression model, 65.9% (marginal R 2 = 0.659) of the total variance was explained by the fixed factors, and an additional 11.4% of the variance was accounted for by the random factors (condition R 2 = 0.773).Like relative intensity difference to the following vowel measure, stress, voicing, following vowel height, and speaking rate are strong predictors of lenition degree for this variable, while the role of place of articulation and preceding vowel height is minimal.Specifically, the model revealed significant main effects of stress: higher mean intensity for unstressed syllables [β = 1.149, t = 5.449, p < 0.001]; voicing: higher mean intensity for voiced stops [β = 15.063,t = 65.316,p < 0.001]; following vowel: close < mid < open [βs = −1.587,−1.277, ts = −6.271,−5.162, ps < 0.001]; and speaking rate: the higher the speaking rate, the higher the mean intensity [β = 0.605, t = 22.507, p < 0.001] (see Table 6).These results indicated that intervocalic stops are weakened to a significantly higher degree in unstressed syllables relative to the stressed syllables; when they are voiced than when they are voiceless; when the following vowels are relatively more open, and when the speaking rate increases.
A significant interaction between the place of articulation and the following vowel (see Figure 5) was also found.Post hoc analysis confirmed that bilabials and velars were lenited when the following vowels were close than when they were mid and open [for bilabials: βs = −1.340,−2.588; ts = −3.592,−6.541; p = 0.01, <0.0001; for velars: βs = −2.300,−2.592; ts = −4.627,−4.964; ps = 0.0002, <0.001].However, dental stops are more lenited only when close instead of open vowels followed (for dentals; β = −3.411,t = −6.181,p < 0.001].These results suggest that the lenition degree of each stop negatively correlates with the degree of the following vowel's openness and that the effects are more pronounced for bilabial than for velar stops.A significant interaction between the place of articulation and the following vowel (see Figure 5) was also found.Post hoc analysis confirmed that bilabials and velars were lenited when the following vowels were closed than when they were mid and open [for bilabials: βs = −1.340,−2.588; ts = −3.592,−6.541; p = 0.01, <0.0001; for velars: βs = −2.300,−2.592; ts = −4.627,−4.964; ps = 0.0002, <0.001].However, dental stops are more lenited only when closed instead of open vowels followed (for dentals; β = −3.411,t = −6.181,p < 0.001].These results suggest that the lenition degree of each stop negatively correlates with the degree of the following vowel's openness and that the effects are more pronounced for bilabial than for velar stops.

Sonorant Posterior Probability
Figure 6 shows the sonorant posteriors probabilities of intervocalic bilabial, dental, and velar voiced stops before (see Figure 6a) and after (see Figure 6b) close, mid, and open vowels in stressed and stressed syllables.

Sonorant Posterior Probability
Figure 6 shows the sonorant posteriors probabilities of intervocalic bilabial, dental, and velar voiced stops before (see Figure 6a) and after (see Figure 6b) close, mid, and open vowels in stressed and stressed syllables.
Interactions between place × preceding and place × following vowels, as well as between preceding × following vowels, were also significant (see Table 7).For the significant interaction between place and preceding vowel (see Figure 7a  For the significant interaction between the place and the following vowel (see Figure 7b), post hoc analyses revealed no effects of the following vowel's openness on any of the three types of stops.However, when followed by open vowels, bilabial and dental stops were less lenited (lower sonority posterior probabilities) than velar stops [βs = −0.117,−0.191; ts = −3.796,−7.535; ps = 0.005, <0.001].
For the significant interaction between preceding and following vowels (see Figure 7c), post hoc analyses suggested that when followed by mid vowels, posterior sonority probabilities were significantly higher when the preceding vowels are mid or open than when they are close [βs = 0.130, 0.191; ts = 3.655, 5.294, ps = 0.008, <0.001).Similarly, when followed by open vowels, posterior sonority probabilities were significantly higher when the preceding vowels were also open rather than close [β = 0.216, t = 5.933, p < 0.001).Interestingly, no effects of the preceding vowel's openness when the following vowels are close.With the exception of close vowels, these results suggested that, stronger lenition occurs when the following and the preceding vowels are equal or greater in openness.

Continuant Posterior Probability
Figure 8 shows the sonorant posteriors probabilities of intervocalic bilabial, dental, and velar voiced stops before (see Figure 8a) and after (see Figure 8b) close, mid, and open vowels in stressed and stressed syllables.
when they are closed [βs = 0.130, 0.191; ts = 3.655, 5.294, ps = 0.008, <0.001).Similarly, when followed by open vowels, posterior sonority probabilities were significantly higher when the preceding vowels were also open rather than closed [β = 0.216, t = 5.933, p < 0.001).Interestingly, no effects of the preceding vowel's openness when the following vowels are close.With the exception of closed vowels, these results suggested that, stronger lenition occurs when the following and the preceding vowels are equal or greater in openness.

Continuant Posterior Probability
Figure 8 shows the sonorant posteriors probabilities of intervocalic bilabial, dental, and velar voiced stops before (see Figure 8a) and after (see Figure 8b) close, mid, and open vowels in stressed and stressed syllables.In addition, a significant interaction between place and preceding vowel and between preceding and following vowel contexts were also obtained (see Table 8).
= 0.4563).These results suggested the following ranking from most lenited to least lenited: velar > dental > labial in postvocalic position.For the significant interaction between the preceding and following vowels (see Figure 9b), post hoc analyses revealed the following.When followed by mid vowels, continuant posterior probabilities were significantly lower (less lenited) when the preceding vowels were closed rather than mid [β = −0.109,t = −3.207,p = 0.036] or open [β = −0.135,t = −3.914,p = 0.003].When followed by open vowels, continuant posterior probabilities are significantly lower only when the preceding vowels are closed rather than mid [β = −0.142,t = −4.077,p = 0.002].On the other hand, when the following vowels are close, continuant posterior probabilities across the three preceding vowel contexts did not reach significance.These results suggest that continuant posterior probabilities are more likely to increase when the preceding and the following vowels are relatively more open (e.g., mid or open, not close).When mid vowels precede the target stops, continuant posterior probabilities increase when the following vowels are of the same or one step higher in the degree of openness.However, the open vowels precede the target stops, and continuant posterior probabilities increase when the following vowels are one step lower in openness.

Results, Summary and Discussion
The degree of lenition of Spanish voiced stop varies as a function of several factors, including stress, place of articulation, quality of surrounding vowels, word status (content or function), and speaking rate.Despite extensive research, no standard method to quantify the degree of lenition has emerged.Under the quantitative acoustic approach, different acoustic dimensions have been employed by different researchers as correlates of lenition.In this study, we compared five acoustic indices of lenition of stops in an Argentinian Spanish corpus (harmonic-to-noise ratio (HNR), duration of the target stops relative to the sum duration of the preceding vowel + target stop + following vowel, intensity difference between the target stops and their preceding and following vowels and mean intensity of the target stops) to the posterior probabilities of the sonorant and continuant phonological features derived from Phonet, a deep recurrent neural network model.The seven lenition metrics are entered as the dependent variables in a series of linear mixed-effect regression models with known factors of lenition, including stress, place of articulation, voicing, preceding and following vowel height/openness, word status (function or content) and speaking rate, as fixed factors.The degree of lenition is predicted to be higher (e.g., higher HNR, lower duration ratio, lower relative intensity differences, higher mean intensity, higher sonorant posterior probability, and higher continuant posterior probability values) in unstressed syllables relative to stressed syllables.Similarly, a higher degree of lenition is predicted for voiced stops relative to voiceless stops, for function words relative to content words, and for a faster speaking rate relative to a slower speaking rate (Broś et al. 2021).Regarding the place of articulation and flanking vowels, mixed results have been reported.For example, Simonet et al. (2012) reported that /d/ is more lenited after a low vowel than after a higher vowel among (Iberian, Majorcan) Spanish, and (Majorcan) Catalan bilinguals.In contrast, Cole et al. (1999) and Ortega-Llebaria (2004) found Spanish /g/ to be less lenited between low vowels than between high vowels, while no effect of vowel height was found for /b/ (Ortega-Llebaria 2004, 2003).
Table 9 summarizes the significant main effects of the seven regression models, one for each dependent variable.As shown in this table, all seven metrics predicted that intervocalic voiced stops in Argentinian Spanish are lenited to a significantly higher degree than voiceless stops and that the lenition degree increases when the speaking rate becomes faster.Interestingly, unlike Broś et al. (2021), who found that lenition is more likely in function words relative to content words in Spanish stops spoken in the Canary Islands when relative duration is a dependent variable, word status is not predictive of lenition advancement along any of the seven metrics we examined, including relative duration.Besides dialectal differences, the right context of the stops examined in Broś et al. (2021) is not limited to vowels but also includes consonants.These or other currently unknown factors may account for this discrepant finding.Crucially, these results suggest that sonorant and continuant posterior probabilities share a predictive pattern of lenition with the five quantitative acoustic metrics.A similar predictive pattern of lenition between sonorant posterior probability and all three intensity measurements is also observed.Specifically, all four metrics predict the degree of lenition in the expected direction: more advanced in unstressed than in stressed syllables.Interestingly, values along all five acoustic metrics, but not the sonorant and continuant posteriors probabilities, significantly vary according to the following vowels.Crucially, the behavior of the sonorant and continuant posteriors probabilities is consistent with the finding that the height of the preceding vowels rather than the height of the following vowels introduced a varying degree of constriction, at least for /d/, in intervocalic context (Simonet et al. 2012).Moreover, inconsistent effects of the following vowels are observed across the five acoustic measurements.Specifically, HNR, relative duration, and relative intensity (to the following vowel) values predicted a lower degree of stop constriction (higher degree of lenition) when the following vowels are relatively close than when they are relatively more open.However, the opposite pattern was suggested by the relative intensity (to the preceding vowel) and mean intensity values: a more advanced degree of lenition when the following vowels are relatively more open.
However, like three of the acoustic metrics, namely HNR, relative duration, and relative intensity (to the preceding vowel), sonorant and continuant posterior probabilities vary significantly but differently as a function of preceding vowel height.That is, while values along the three acoustic metrics suggest a higher degree of weakening when the preceding vowels are relatively more close, the opposite is indicated by the sonorant and the continuant posterior probabilities: the more open the preceding vowels are, the greater the degree of lenition.Note, however, that the predictive pattern of sonorant and continuant posterior probabilities, but not of the three acoustic dimensions, is consistent with the articulatory effort-based view of lenition (Kirchner 1998(Kirchner , 2013)).More specifically, since the distance that the articulators will travel from (and possibly also to) a lower vowel is likely reduced relative to a higher vowel, lenition should be more prevalent after a more open than a more close vowel.
Place of articulation of the stops exerts influence on four acoustic metrics: HNR, relative duration, relative intensity (to the preceding vowel), relative intensity (to the following vowel), as well as on the sonorant and the continuant posterior probabilities.While the pattern of the influence is similar for the sonorant and the continuant posterior probabilities, it differs across the four acoustic metrics.Like sonorant and continuant posterior probabilities, the two acoustic intensity measures indicate that velar stops are more lenited than dental and bilabial stops.This lenition pattern is consistent with that of Kingston's (2008) finding on Spanish stops produced by two female speakers from Ecuador and Peru.He reasoned that velar stops are more lenited perhaps "because velar closures are more often incomplete" (footnote 20, p. 21).
However, the two sets of measures (two acoustic intensity measures vs. sonorant and continuant posterior probabilities) differ in their ranking of the bilabial and the dental stops.Consistent with Kingston (2008)'s prediction, dentals are more lenited than bilabials according to the two acoustic intensity measures, while the opposite is predicted by the sonorant and the continuant posterior probabilities.Alternatively, contrary to Kingston's prediction, it is possible that bilabial closures are less complete than dental closures and are thus more lenited.The fact that the two intensity measures are computed relative to their immediately preceding and following vowels only while sonorant and continuant posterior probabilities are estimated globally based on all sonorant, and continuant segments in the corpus may also explain their differing predictive pattern.On the other hand, HNR and relative duration measures suggest that dental stops are (non-significantly) more lenited than velar stops and significantly more lenited than bilabial stops, while relative duration indicates that dental stops are significantly more lenited than both bilabial velar stops.These results are inconsistent with the effort-based account of lenition (Kirchner 1998(Kirchner , 2013) and Kingston's (2008) findings.
In addition to significant main effects, our regression models also yielded significant interactions mainly between the place of articulation and the preceding and following vowels as well as between preceding and following vowels.The significant place × preceding vowel interaction was found for HNR, relative duration, relative intensity (to the preceding vowel), as well as for sonorant and continuant posterior probabilities.Post hoc pair-wise comparisons indicated different interaction patterns for different metrics, with the patterns for sonorant and continuant posteriors probabilities being more uniform and consistent with previous findings (e.g., Kingston 2008) than the three acoustic metrics.More specifically, the more open the preceding vowels are, the more lenited the velar stops are predicted by the sonorant posterior probabilities.Further, velar stops are predicted to be more lenited than bilabial and dental stops when preceded by open vowels.Similarly, continuant posterior probabilities predicted that velar stops are more lenited than bilabial and dental stops when preceded by mid and open vowels but not close vowels.On the other hand, HNRs predicted a higher degree of lenition for dental and bilabial stops after close vowels, while preceding low vowels trigger a higher degree of lenition for velar stops.Relative duration predicts a higher degree of lenition for dental stops compared to bilabial and velar stops after open vowels, while relative intensity (to the preceding vowels) predicts a more advanced degree of lenition of velar stops relative to bilabial and dental stops when the preceding vowels are relatively more close than open.
Significant place × following vowel interactions are found for every metric, except for mean intensity and continuant posterior probabilities.Overall, post hoc analyses suggested that following close vowels introduced a higher degree of variation in the acoustic metric values and predicted a higher degree of lenition of stops in this context.On the other hand, sonorant posterior probabilities are more affected by following open vowels and predicted higher degrees of lenition of velar stops in this environment.
Significant preceding × following vowel interactions are found for HNR, relative duration, relative intensity (to the preceding vowel), relative intensity (to the following vowels), and sonorant and continuant posterior probabilities.Post hoc analyses indicated that a higher degree of lenition is predicted by all acoustic metrics when flanking vowels are relatively close in height and when they are relatively more close than relatively more open.On the other hand, relatively more open flanking vowels are predicted to trigger a higher degree of lenition, particularly for velar stops, by the sonorant posterior probabilities.On the contrary, continuant posterior probabilities predicted a higher degree of lenition when either the preceding or the following vowel height is relatively high.This finding may be explained by the fact that smaller oral opening is more conducive to friction generation, characteristics of fricatives, and members of the [+continuant] phonological class.
In conclusion, the degree of lenition predicted by different lenition metrics vary.However, lenition patterns predicted by the sonorant and continuant posterior probabilities are more consistent and in the direction expected by previous findings and the effort-based view of lenition.As far as the main effects are concerned, lenition patterns predicted by sonorant and continuant posterior probabilities are largely consistent with the relative acoustic intensity measures.This is not surprising given that inputs to the Phonet model that generate the sonorant and continuant posterior probabilities are feature sequences based on log energy distributed across 33 triangular Mel filters of each 0.5 s chunk of the input signals.Some differences between these two sets of metrics may lie in the fact that acoustic intensity measures are relative to the target stops immediate left and right contexts, while sonorant and posterior probability estimates are relative to the whole class of sonorant and continuant segments in the corpus.Sonorant and continuant posterior probabilities relative to the preceding and following segment only could be used in future research to see if minor discrepancies in lenition predictive patterns found between these two sets of metrics could be eliminated.In addition, the approach could be further improved by replacing forced alignment with the automated segmentation method proposed by (Ennever et al. 2017).

Conclusions
The degree of intervocalic Argentinian Spanish stop weakening across known lenition factors predicted by five quantitative acoustic metrics and two metrics, posteriors probabilities of the sonorant and the continuant phonological features, derived from a deep neural network Phonet model, were compared.As expected, all seven metrics predicted a higher degree of lenition in stressed syllables relative to unstressed syllables and in a faster speaking rate compared to a slower speaking rate.On the contrary, the effects of flanking vowel height and stop place of articulation on the lenition patterns were differentially predicted by the five acoustic metrics.However, lenition patterns predicted by the sonorant and the continuant posterior probabilities are largely consistent with those of the relative acoustic intensity measures confirming, on the one hand, the superiority of the intensity measures and, on the other hand, the reliability of Phonet as an alternative or additional approach to investigate the degree of lenition.

Figure 2 .
Figure 2.Estimated marginal means of relative duration by place of articulation and preceding vowel (a, upper), by place of articulation and following vowel (b, middle), and by preceding and following vowel (c, lower).The dots represent the estimated marginal means, and the interval lines display 95% confidence interval.

Figure 3 .Figure 3 .
Figure 3.Estimated marginal means of intensity difference relative to the preceding vowel by place of articulation and preceding vowel (a, upper), by place of articulation and following vowel (b,

Figure 4 .
Figure 4.Estimated marginal means of intensity difference relative to the following vowel by preceding and following vowel.The dots represent the estimated marginal means and the interval lines display 95% confidence interval.

Figure 4 .
Figure 4.Estimated marginal means of intensity difference relative to the following vowel by preceding and following vowel.The dots represent the estimated marginal means and the interval lines display 95% confidence interval.

Figure 5 .
Figure 5.Estimated marginal means of mean intensity by place of articulation and following vowel.The dots represent the estimated marginal means and the interval lines display 95% confidence interval.

Figure 5 .
Figure 5.Estimated marginal means of mean intensity by place of articulation and following vowel.The dots represent the estimated marginal means and the interval lines display 95% confidence interval.

Figure 7 .
Figure 7.Estimated marginal means of sonorant posterior probability by place of articulation and preceding vowel (a, upper), by place of articulation and following vowel (b, middle), and by preceding and following vowel (c, lower).The dots represent the estimated marginal means and the interval lines display 95% confidence interval.

Figure 7 .
Figure 7.Estimated marginal means of sonorant posterior probability by place of articulation and preceding vowel (a, upper), by place of articulation and following vowel (b, middle), and by preceding and following vowel (c, lower).The dots represent the estimated marginal means and the interval lines display 95% confidence interval.

Figure 9 .
Figure 9.Estimated marginal means of continuant posterior probability by place of articulation and preceding vowel (a, upper), and by preceding and following vowel (b, lower).The dots represent the estimated marginal means and the interval lines display 95% confidence interval.

Figure 9 .
Figure 9.Estimated marginal means of continuant posterior probability by place of articulation and preceding vowel (a, upper), and by preceding and following vowel (b, lower).The dots represent the estimated marginal means and the interval lines display 95% confidence interval.

Table 1 .
Word distribution by conditions -voicing, place of articulation, preceding vowel, and following vowel.The number left and right of the slash in each cell represents the number of word tokens and word types respectively.

Table 2 .
Summaries of Mean HNR: The fixed-effects in the linear mixed-effects model (a: upper), and the type-III-ANOVA analysis (b: lower).
Note: significant ps < 0.05 are in bold.

Table 3 .
Summaries of relative duration: The fixed-effects in the linear mixed-effects model (a: upper), and the type-III-ANOVA analysis (b: lower).
Note: significant ps < 0.05 are in bold.

Table 4 .
Summaries of intensity difference relative to the preceding vowel: The fixed-effects in the linear mixed-effects model (a: upper), and the type-III-ANOVA analysis (b: lower).
Note: significant ps < 0.05 are in bold.

Table 5 .
Summaries of intensity difference relative to the following vowel: The fixed-effects in the linear mixed-effects model (a: upper), and the type-III-ANOVA analysis (b: lower).
Note: significant ps < 0.05 are in bold.

Table 6 .
Summaries of mean intensity: The fixed-effects in the linear mixed-effects model (a: upper), and the type-III-ANOVA analysis (b: lower).

Table 6 .
Summaries of mean intensity: The fixed-effects in the linear mixed-effects model (a: upper), and the type-III-ANOVA analysis (b: lower).

Table 7 .
Summaries of sonorant posterior probability: The fixed-effects in the linear mixed-effects model (a: upper), and the type-III-ANOVA analysis (b: lower).

Table 7 .
Summaries of sonorant posterior probability: The fixed-effects in the linear mixed-effects model (a: upper), and the type-III-ANOVA analysis (b: lower).

Table 9 .
Summary of the linear mixed-effects regression models.denotes significant main effects (p < 0.05).