Next Article in Journal
Morphosyntactic Integration of Single-Word Anglicisms in Border Mexican Spanish
Previous Article in Journal
Grammatical Error Patterns in ChatGPT-Generated Modern Standard Arabic Texts: A Linguistic Analysis of Recurrent Patterns
Previous Article in Special Issue
Speech Variation in the Teaching of Italian as a Second/Foreign Language: A Critical Review
 
 
Article
Peer-Review Record

Lexical Frequency and the Realization of Italian Dental Affricates

by Chiara Meluzzi * and Nicholas Nese
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 22 October 2024 / Revised: 2 April 2026 / Accepted: 7 April 2026 / Published: 1 May 2026
(This article belongs to the Special Issue Speech Variation in Contemporary Italian)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

See the attached file.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

A few of them were included in the word document file.

As I am not a native speaker of English, it is hard to point all of them out.

I understood most parts of the manuscripts, but I felt that there were some grammatical mistakes.

Author Response

All comments included in the word file have been considered and addressed.

Please note that this revised version contains a big change in aims and scope, by avoiding completely the ‘accommodation’ topic in favour of a more precise focus on the interface between phonetics and the lexicon. Therefore, the comments in our first draft regarding the eliminated section will not appear in this revised version, because of a modification of the focus (and the subsequent elimination of said paragraphs).

Obviously, we thank the reviewer for all their comments, which would be precious for a future work more focused on speech accommodation.

Reviewer 2 Report

Comments and Suggestions for Authors

The paper discusses the realizations of dental affricates in Italian. The article presents data collected through a reading task and a map-task . The authors address the variability of the realizations by investigating the role of different factors: phonological contexts, lexical characteristics (such as frequency), and phonetic accommodation.

Eigth speakers (4 male and 4 female) participated. They performed the reading task individually and the map-task in pairs (4 pairs). Overall, the paper is clear, with a well-introduced theoretical background. The construction of the materials and the phonetic measurements taken appear accurate.

While the study’s aim is commendable and the findings are interesting, the low number of participants constitutes an important limitation, albeit not one that undermines the study’s relevance. However, in my opinion, the paper suffers from two major issues related to the choice of the design to address phonetic accommodation effects and the statistical treatment of the data. For these reasons, my recommendation is “major revisions”. That said, I believe that despite the limited number of participants, the paper is interesting and has the potential to make an important contribution, provided the scope is readjusted and the statistical analyses are revised.

 

 


Experimental Design and Accommodation

 

In my opinion, the design adopted (coupled with a limited number of participants) is not well tailored to study accommodation effects. Each speaker was tested alongside only one other speaker, and only once. Thus, it is not possible to examine a speaker’s accommodation behavior with respect to a specific interlocutor (e.g., as opposed to another interlocutor). Likewise, it is not appropriate to use the productions obtained from the reading task as a baseline to investigate the (semi-)spontaneous productions in the map-task. Correctly, the authors simply compare the productions of each participant across tasks but do not take the behavior of a speaker in the reading task as their neutral baseline to infer accommodation.

As a result, the only way to infer accommodation effects would be to observe whether convergence or divergence occurs over the course of the conversation between the two interlocutors. How do they change their behavior over time in relation to each other? Given the limited number of participants, this approach is problematic, but it could still be a viable method, provided the interactions were long enough.

The authors, however, prefer to focus mainly on a dimension distinct from the interlocutor’s behavior per se: the role played by the speaker in the map-task, as the giver of instructions or as the follower. This dimension is distinct from the linguistic behavior of the co-participant and their speech characteristics, making it a logically separate factor to consider. Currently, the authors discuss accommodation effects within the pairs in interaction with the role played by them, and this is not fully justified by the design or by the sample size. I believe that analyzing the role in the map-task is an interesting factor to consider. Furthermore, since it is a binary variable, the data analysis is straightforward. However, even in this case, the design is not particularly well-suited. During the recording sessions, the two interlocutors switched roles. As such, participants could have perceived the task as collaborative, reducing the potential effects of the role they were playing at the moment. This possible limitation should be mentioned, in my view.

In my opinion, the discussion of accommodation should be significantly downplayed in the economy of the paper. Firstly, because of the weakness of the design in this perspective. Secondly, the quantitative analysis of the productions by the pairs is basically limited to a description of percentages for the individual pairs and to a visual presentation of plots. No statistical treatment is offered in this respect.

In summary, accommodation cannot be presented as an a priori research question. In my opinion, accommodation should be downplayed to the role of an exploratory analysis. The paper offers a valuable investigation of diffrent factors that can impact the realization of affricates in Italian: phonological contexts, type of speech (read vs. semi-spontaneous), gender, and regional variety. In my view, these aspects alone fully justify the relevance of the ppaer.

 


Statistical Analyses

 

The authors do not clarify which statistical models or tests they adopt, nor do they justify their approach. This makes it more complicated for the reader to understand the adequacy of the tests. Still, from what I understand, the statistical treatment is not adequate. As for the categorical dependent variables, the formulas reported in the text suggest that they adopted Pearson’s chi-squared test, which is a very unusual approach. From what I understand, each data point was taken as an independent observation, violating one of the central assumptions of the chi-square test, and this presumably results in an artificial reduction of the p-values. Each production of a speaker is not independent from all the productions of the same speaker. Mixed models are nowadays a standard practice in these cases, since they allow the model to take into consideration, besides the speakers, different error terms. Alternative approaches are possible as well. Even a repeated-measures ANOVA over proportions is more adequate than a chi-square with this data.

Likewise, the analyses of continuous dependent variables (i.e., duration values) are conducted via ANOVA. In principle, this approach is feasible, although outdated. The authors do not report ANOVAs in the standard form, with two degrees of freedom, but only one. This does not allow the reader to understand if the models were Independent Measures ANOVA or Repeated Measures ANOVA. Once again, this difference is vital because the application of Independent Measures ANOVA would lead to artificial results and misleading p-values.

In my opinion, the authors should revise the section concerning the statistical analyses, clarify their approach, and adopt statistical tests that are adequate for the design and type of data.


Minor Issues

                End of page 5: “equal length and controlled prosodic contour”. What do the authors mean when they write that the written sentences to read were controlled for prosodic contour?

                Sex vs. gender: The authors formulate their research questions by making reference to sex. Why don’t they use “gender” systematically? Is there any reason to believe that biological dimorphism plays a role in the realization of the affricates?

                The authors occasionally use “subjects,” which is nowadays stigmatized. I recommend using the term “participants.”

                While the description of the labeling procedure is extremely detailed, the design description is inadequate. The distribution of participants is unclear: are the factors of speaker gender and geographic area of origin counterbalanced?

                The authors describe that affricates always qualify as geminates between vowels in Italian. However, the authors differentiate between singleton and geminate in V_ V. Do they refer to spelling when making this distinction?

                The authors should report how long the session lasted on average, besides the total length of the recordings.

 

Author Response

Thank you for your precise comments.

All minor issues have been carefully addressed.

As for your comment on Accommodation, we have completly deleted this part from our paper, thus shifting the focus only on lexical frequency. 

(Please see the attached detailed author response to your initial review.)

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This paper is an interesting contribution to the topic of speech accommodation, since it addresses it from an innovative perspective which includes several of the important issues in the field: the dialogical role of speaker and listener, sociolinguistic factors (i.e. gender and geographical origin), phonological variables as well as the role of the lexicon. It aims at giving a global and comprehensive vision of the phenomenon from a multifaceted research perspective. Broadly speaking, this is a remarkable contribution (original, coherent, attention-grabbing) which continues previous works and, logically, constitutes new evidence in the characterization of the phenomenon under study. Though it wants to cover a wide range of objectives, which may seem very distinct, there is an adequate justification for them, and they are perfectly and convincingly related.

It is worth noting that the work is well documented: the authors offer a quite exhaustive review of the relevant literature, which guides the reader through the complex mechanisms at play in the accommodation processes. This section (§2) makes it possible to understand and to give context to the research and to identify research gaps in the area which justify the scope of this work. These theoretical remarks are perfectly organized and structured and provide an adequate background to the research questions and the working hypotheses which are specified in section 3.

The main concerns, however, affect section 3 and section 4. In the first of them, though the method and materials are generally very well described, some information deserves more accurate attention or, at least, it should benefit from a more clear and direct explanation. It should be particularly interesting to set the variables taken into account in a more concrete way, detailing which ones constitute dependent variables and which of them are set as factors. It would also be important to determine the presence of random variables, if any (which seems not to be the case attending the whole explanation). Obviously, the paper presents the different variables, but the reader must pick up this information from the diverse subsections and deduce their role in the subsequent analysis, which can be a problem in certain cases. One instance of this is the geographical origin of the speakers: if this reviewer is not mistaken, there is no mention to this factor neither in the research questions nor in the hypothesis, and the only reference to it is that there are two groups of speakers (one from Lombardy and one from Sicily -Northern and Southern speakers, respectively, in the rest of the paper). Why have been these two origins selected? Is this going to be considered in the study? We see that it is in section 4, but it would be important to set this point previously.

In addition, it is important to provide information about the statistical analysis, which is very closely related to the explanation of the variables. This should be a core part of the methodology section, since should allow the understanding of the results and their obtention. In this sense, it would be interesting to specify the kind of statistical tests employed, the criteria in their application and (though it may seem obvious) the significance level.

In subsection 3.3 the annotation procedure is carefully explained. However, it would be interesting to describe the criteria by which the different realizations are considered voiced, unvoiced or intermediate, since relevant work on voicing made clear that, phonetically, it is a gradual feature (Smith 1997, Davidson 2015, 2016, among others). It would be very enlightening to make clear the categorization procedure concerning this issue.

Section 4, in fact, would greatly profit from the above suggestions. Apart from this, it is suggested to homogenize the terminology according to the labels/categories set in the methodology, and to try to adjust to it. Obviously, it is not a real obstacle to the comprehension of the ideas, but sometimes it requires an extra effort from the reader to deduce exactly which context or which realization the text is referring to, particularly if comparing with the information in the tables.

On the other hand, though the statistical results are offered in the different tables, it would be interesting to provide in the text the results which support the statements that are describing the behavior of the different realizations, just to ensure that these statements really came from objective data (see, for example, p. 14). In fact, there are some points where the statistical results do not seem to support what is commented: in p. 9, for instance, the results reported at the bottom of the second paragraph demonstrate that there are relevant differences in the realization of the affricate in each group of speakers, but they do not indicate that there are significant differences between the Northern and the Southern speakers. It would be really interesting to check if there is such a difference depending on the geographical origin, at least regarding the initial sentence in this paragraph. In other cases, further explanation about the behavior of the allophones may improve the understanding of the phenomena: i.e. in. p. 15 the authors report the relationship between duration of the fricative moment in the affricates and the type of task, but no details about the sense of this relationship are offered. When this fricative moment is shorter or longer? Also in p. 9, there is a change in the parameters of the analysis: in the first paragraph, the authors display the occurrence of the different realizations depending on the phonological context using the relative frequency values. However, when it comes to the intermediate affricates, they switch to absolute frequency values, which entails very different information if compared to percentages. Obviously, the higher absolute values for these intermediate affricates correspond to initial and geminate context: these are the contexts with more items by far. Relative frequencies, on contrast, show that intermediate affricates are proportionally more common in post-lateral and post-nasal contexts. If adequate, this change in the analysis procedure should be justified.

Finally, regarding the discussion section, only one remark, which comes from the comparison with the exhaustive and accurate explanations in section 2: maybe it would be adequate to detail the references to which the statements are referring in the text. It would enrich the analysis and would strengthen the justification.

 

Some concrete aspects:

·         p. 7: 3rd paragraph, «followed by the symbols “+” and “+”» ??

·         p. 11: in caption of Figure 1:

o   «* indicates a non-significant distribution». Only as a suggestion: distributions can be considered significant? Results can be significant or non-significant, but not exactly the distribution of the data.

o   In the second bar chart: may be «Speaker» should be «Speaker 4»?

·         p. 15: format of table 9 is different from the rest.

Comments on the Quality of English Language

It is advised to review the paper in what concerns English language: there are some typing errors and some syntax mistakes that should be corrected.

Author Response

We really wish to thank you for your comments and suggestions.

We have tried our best to integrate everything in our revised version. Please note that we have moved away from the topic of accommodation, thus many things have changed.

Here are our answers on some specific comments.

Comment on sub-section 3.3: we have clarified how the different voicing variants have been differentiated in our corpus, by following pre-existing annotation protocol on the annotation of Italian dental affricates. We completely agree with the reviewer that voicing is gradual phenomenon, and we have added the appropriate reference.

Comment on section 4: we have adjusted the labels and homogenized it to current terminology. We have added further explanation for our specific use of labels.

Comment on analysis and discussion: by following reviewers’ and editor suggestion we have moved away from the topic of speech accommodation by focusing only on lexical frequency and the lexicon. The discussion, therefore, has been modified in this respect.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I felt that the manuscript was well-improved. I don't remember the previous draft well, but I think that the structure was largely modified. (The previous draft stated that this study is about phonetic convergence, didn't it??) I like the current version, which focuses on frequency effects. It is much clearer now. I have a few comments, though:

I felt that the manuscript was well-improved. I don't remember the previous draft well, but I think that the structure was largely modified. (The previous draft stated that this study is about phonetic convergence, didn't it??) I like the current version, which focuses on frequency effects. It is much clearer now. I have a few comments, though:

 

>> "In English, numerous studies have demonstrated duration reduction
in high-frequency words (Bell et al., 2009; Gahl, 2008), with additional effects on
vowel quality (Munson & Solomon, 2004) and consonant realization (Raymond et
al., 2006). Similar patterns have been observed in other Germanic languages,
including Dutch (Pluymaekers et al., 2005) and German (Zimmermann, 2016), as
well as in Romance languages such as Spanish (Torreira & Ernestus, 2011) and
French (Adda-Decker et al., 2005)."

To the best of knowledge, the following papers discuss the effects of word frequency on phonetic reduction in other languages such as Japanese and Taiwan Southern Min. Probably, it is more persuasive to cite studies about non-European languages:

Hashimoto, D. (2021). Probabilistic reduction and mental accumulation in Japanese: Frequency, contextual predictability, and average predictability. Journal of Phonetics87, 101061.

Wang, S. F. (2022). The interaction between predictability and pre-boundary lengthening on syllable duration in Taiwan Southern Min. Phonetica, 79(4), 315–352.

Hashimoto, D. (2023). The effect of verbal conjugation predictability on speech signal. Morphology33(1), 41-63.

As you discuss usage-based phonology in discussion, Hashimoto (2021) may provide some insights to your study.

 

 >> 4.1 Voicing degree

I did not understand what it means. Table 1 has three levels voiceless vs. voiced vs. intermediate. It may be better to state what these mean (how you defined these three levels) in Section 3.3. Probably, it is nice to show waveforms or spectrograms.

 

>> “A more specific role of lexical frequency on dental affricate realization was tested through a multinominal logistic regression.”

Then, it may be better to show the summary table. It will help readers to understand your model.

>> 4.2 Duration

Why do you not explore the word-frequency effects on duration. As you reviewed in Section 2.1, the duration of an affricate in high frequency words should be shorter.

To be honest, I am not sure whether 4.2 Duration is necessary in your study. According to the research questions in 3.1, this study seems to focus on the realization of an affricate phoneme, which may be addressed in Section 4.1.

Comments on the Quality of English Language

I cannot spot all the typos, but I would like to point some of them out:

"Basing on previous literature on the interface of phonetics and the lexicon (2.1)" should be "Based on previous literature on the interface of phonetics and the lexicon (2.1)"

"Japonese" should be "Japanese"?

Author Response

Comment 1

I felt that the manuscript was well-improved. I don't remember the previous draft well, but I think that the structure was largely modified. (The previous draft stated that this study is about phonetic convergence, didn't it??) I like the current version, which focuses on frequency effects. It is much clearer now. I have a few comments, though:

I felt that the manuscript was well-improved. I don't remember the previous draft well, but I think that the structure was largely modified. (The previous draft stated that this study is about phonetic convergence, didn't it??) I like the current version, which focuses on frequency effects. It is much clearer now. I have a few comments, though:

 >> "In English, numerous studies have demonstrated duration reduction
in high-frequency words (Bell et al., 2009; Gahl, 2008), with additional effects on
vowel quality (Munson & Solomon, 2004) and consonant realization (Raymond et
al., 2006). Similar patterns have been observed in other Germanic languages,
including Dutch (Pluymaekers et al., 2005) and German (Zimmermann, 2016), as
well as in Romance languages such as Spanish (Torreira & Ernestus, 2011) and
French (Adda-Decker et al., 2005)."

To the best of knowledge, the following papers discuss the effects of word frequency on phonetic reduction in other languages such as Japanese and Taiwan Southern Min. Probably, it is more persuasive to cite studies about non-European languages:

Hashimoto, D. (2021). Probabilistic reduction and mental accumulation in Japanese: Frequency, contextual predictability, and average predictability. Journal of Phonetics, 87, 101061.

Wang, S. F. (2022). The interaction between predictability and pre-boundary lengthening on syllable duration in Taiwan Southern Min. Phonetica, 79(4), 315–352.

Hashimoto, D. (2023). The effect of verbal conjugation predictability on speech signal. Morphology, 33(1), 41-63.

As you discuss usage-based phonology in discussion, Hashimoto (2021) may provide some insights to your study.

 

Answer to Comment 1         

We greatly thank the reviewer from their precious suggestion. We have carefully read the suggested paper and integrated them in the theoretical part in section 2.1. Indeed, now the paragraph has been modified as follows:

Crucially, research on frequency-related phonetic reduction has increasingly extended beyond European languages, revealing both universal patterns and language-specific modulations, that also align within an Exemplar Theory framework. In Japanese, Hashimoto (2021) demonstrated that morpheme duration is systematically affected by three types of information-theoretic measures: morpheme frequency, contextual morpheme predictability (both forward and backward), and average morpheme predictability (henceforth, informativity). His findings show that morphemes with higher frequency and higher contextual predictability are produced with shorter duration; backward predictability (conditioned on the following morpheme) exerts stronger effects than forward predictability. A significant effects of informativity was also found, thus suggesting that reduction patterns may be lexicalized rather than purely context-dependent. In a following (and complementary) study focusing on verbal conjugation, Hashimoto (2023) found that Japanese non-past indicative forms with higher conjugation predictability (i.e., verbs frequently used in that particular form) were produced with shorter duration, with an estimated reduction of 7.7 milliseconds when comparing mean conjugation predictability to one standard deviation lower. These findings were interpreted within Exemplar Theory as reflecting ease of production target creation: conjugation forms with higher resting activation levels (due to more frequent usage) are accessed more quickly, resulting in reduced articulatory effort and shorter duration.

Similar patterns have been documented in other tone languages. In Taiwan Southern Min, Wang (2022) investigated how predictability measurements—including bigram surprisal, bigram informativity, and lexical frequency—interact with prosodic phrasing to affect syllable duration in spontaneous speech. The study found that higher informativity and surprisal led to longer syllables, consistent with information-theoretic predictions. Importantly, Wang demonstrated that these predictability effects were modulated by prosodic position: there was a general weakening of predictability effects for syllables closer to prosodic boundaries, especially in pre-boundary positions where pre-boundary lengthening was strongest. However, the effect of word informativity appeared least modulated by boundary marking, suggesting that informativity-specific durational variants may be stored as part of lexical representations. In Mandarin, Tang and Shaw (2021) provided converging evidence that prosodic information "leaks into" the mental representations of words, with word-specific duration patterns reflecting both local contextual predictability and prosodic characteristics of typical usage contexts.

Put together with the previous results on European languages, these new cross-linguistic evidence strengthen the role of informativity effects, thus also providing strong support for exemplar-based models of lexical representation. The finding that words with generally high predictability show reduction even in locally unpredictable contexts (documented in Japanese, Mandarin, and Taiwan Southern Min) suggests that phonetic variants are stored as part of lexical representations rather than computed purely online. This lexicalization hypothesis receives further support from the observation that informativity effects are more resistant to modulation by prosodic context than contextual surprisal effects, as demonstrated particularly clearly in Wang's (2022) analysis of Taiwan Southern Min. The consistency of these patterns across languages with radically different prosodic systems (i.e., stress-timing, syllable-timing, and mora-timing) seems to indicate that the storage of usage-based phonetic variants may be a universal property of the human lexical system.

 

Comment 2

 4.1 Voicing degree

I did not understand what it means. Table 1 has three levels voiceless vs. voiced vs. intermediate. It may be better to state what these mean (how you defined these three levels) in Section 3.3. Probably, it is nice to show waveforms or spectrograms.

Answer to comment 2

We have better explained the nature of intermediate affricates in section 3.3, by adding the following statement: “As in the aforementioned papers, different voicing degrees have been identified acoustically, by inspecting the presence of the voiced bar for at least the 75% of the affricate. Intermediate realizations are, thus, defined as those affricates realized as voiced in the occlusive portion, but voiceless in the fricative one, as shown in Fig. 1 as taken from our data (cf. also Author 1, 2020, for a perceptive account of these sounds as neither voiceless nor voiced).”

We also add Figure 1 to exemplify an affricate with an intermediate voicing degree.

In section 4.1, then, we add a reminder to the previous methodological section before presenting the data in table 1.

 

Comment 3

>> “A more specific role of lexical frequency on dental affricate realization was tested through a multinominal logistic regression.”

Then, it may be better to show the summary table. It will help readers to understand your model.

Answer to Comment 3

The reviewer is absolutely right. We have added both figures and tables to summarizes our results in both the section on voicing and the one on duration. We also further specify the model that we have used in both analysis, to clarify our procedure to the reader and ensure reproducibility.

 

Comment 4

>> 4.2 Duration

Why do you not explore the word-frequency effects on duration. As you reviewed in Section 2.1, the duration of an affricate in high frequency words should be shorter.

To be honest, I am not sure whether 4.2 Duration is necessary in your study. According to the research questions in 3.1, this study seems to focus on the realization of an affricate phoneme, which may be addressed in Section 4.1.

Answer to Comment 4

Surely that was a clear weakness of our paper. If we remember correctly, in the previous round of reviews, one of the reviewers have suggested to also better explore duration alongside with voicing degree. Therefore, we added a new analysis on duration by using the same mixed-effect models. We also integrated better those models and their explanation in our analysis. To do so, we moved the results previously illustrated through Anova in the Appendix.

 

Comment 5

Comments on the Quality of English Language

I cannot spot all the typos, but I would like to point some of them out:

"Basing on previous literature on the interface of phonetics and the lexicon (2.1)" should be "Based on previous literature on the interface of phonetics and the lexicon (2.1)"

"Japonese" should be "Japanese"?

Answer to comment 5

Thank you for pointing this out. We carefully re-read our final manuscript trying our best to find all the typos.

 

 

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript is substantially improved. It remains clearly written, and the argumentation is tighter. The revision re-centers the empirical discussion on lexical frequency, aligning hypotheses, operationalization, and results; this focus clarifies the contribution and makes the theoretical relevance explicit. The connection between the stated predictions and the ensuing interpretation is now much clearer, and the literature framing is fully adequate. Overall, narrative coherence and transparency of empirical goals are markedly better than in v1. While several issues remain serious (chiefly statistical modeling and frequency measurement), the present version constitutes a solid step forward.

Mixed models. Just to make sure we’re fully aligned on the statistics: my suggestion is (and was) to make mixed-effects models the backbone of the analysis and to treat voicing as the main categorical outcome in a binomial GLMM. A practical option is to fit glmer with voiceless vs. other as the dependent variable (code voiceless = 1, {voiced, intermediate} = 0). As a robustness check, you can also run voiceless vs. voiced after excluding intermediate. A sensible baseline is:

glmer(voiceless ~ style * phonological_context * geo_origin (…) + (1 | speaker) + (1 | item), family = binomial, data = df)

This specification estimates the probability that a token is voiceless (rather than voiced or {voiced + intermediate}) and how that probability varies with the predictors and their interactions. This would solved the problem with χ², which assumes independent observations. Since each speaker provided more than a one observation, the observations are not independent.

ANOVA. Another issue raised in the review concerns ANOVA. The presentation was not fully effective. Neither v1 nor v2 clearly specifies whether the ANOVA is between-participants or within-participants (repeated-measures). Given the repeated-measures design, a repeated-measures ANOVA would be required if that framework is retained. However, a more appropriate approach is to model duration as a continuous dependent variable with Linear Mixed Effects Models, which better accommodate crossed random effects and unbalanced data.

Frequency. The authors’ reply letter suggests that, in v2, frequency was modeled as a continuous predictor, perhaps following feedback from another reviewer. For clarity, this was not the thrust of my original comment: the core point concerned the need for a mixed-effects framework to handle non-independent, repeated-measures data (multiple observations per speaker) when modeling voicing outcomes (voiceless/voiced/intermediate). Regardless of whether frequency is included, the backbone should be GLMMs (binomial or multinomial) with random intercepts for speaker and item (and slopes wehn appropriate)

Given the amount of data, however, a sufficient piece of evidence could also be obtained by treating frequency in terms of ordinal levels, for instance, by splitting the data into tertiles or quartiles.

 

General Assessment and Conclusion

In conclusion, the paper is genuinely interesting, and this second version demonstrates a strong focus, effectively linking the findings with a broader theoretical perspective. However, the quantitative presentation and the statistical treatment of the results still contain methodological issues that require a focused revision.

I suggest the authors prioritize two crucial directions to ensure the validity and rigor of the reported findings. In my opinion, the first point is critical, as the effect of frequency constitutes a central focus of the paper:

Frequency Predictor Validity: The authors must clearly report how frequency was calculated and entered into the model. Specifically, the data must be transformed (e.g., using a logarithmic or Zipf scale) to meet the model's distributional assumptions, or, at a minimum, be recoded as a robust ordinal variable to ensure sound results.

Statistical Framework: The authors should consider adopting the Mixed-Effects Model framework for all analyses (both for multi/binomial variables and for duration). This is the standard practice for handling the dependent and non-independent nature of the data.

 

The identified issues are correctable statistical challenges that, once addressed, will unlock the full potential of this research. The data and the theoretical framework are on point; a final, rigorous step on the quantitative treatment will make this a relevant contribution to the field. I look forward to recommending acceptance of the next version.

Author Response

Comment 1

Mixed models. Just to make sure we’re fully aligned on the statistics: my suggestion is (and was) to make mixed-effects models the backbone of the analysis and to treat voicing as the main categorical outcome in a binomial GLMM. A practical option is to fit glmer with voiceless vs. other as the dependent variable (code voiceless = 1, {voiced, intermediate} = 0). As a robustness check, you can also run voiceless vs. voiced after excluding intermediate. A sensible baseline is:

glmer(voiceless ~ style * phonological_context * geo_origin (…) + (1 | speaker) + (1 | item), family = binomial, data = df)

This specification estimates the probability that a token is voiceless (rather than voiced or {voiced + intermediate}) and how that probability varies with the predictors and their interactions. This would solved the problem with χ², which assumes independent observations. Since each speaker provided more than a one observation, the observations are not independent.

Answer to Comment 1

We thank the reviewer for the methodological suggestion and fully agree on the need to address the issue of non-independent observations through mixed-effects models rather than χ² tests. However, we believe that a binomial GLMM with voiceless vs. voiced/intermediate realizations would not be appropriate for our data for the following reasons.

First, there is no theoretically justified baseline in our data. As specified in 2.2, the distribution of voiceless vs. voiced realizations of Italian dental affricates is neither random or affected only by sociolinguistic factors. Conversely, it follows well-defined diachronic patterns: words of Latin origin tend toward voiceless realizations, while those of Arabic and Lombard origins privilege voiced ones. Therefore, there is no "base" or unmarked realization from which the others derive: both are legitimate and predictable outcomes based on lexical etymology.

Second, intermediate realizations represent a qualitatively distinct phenomenon. They are not predicted by any diachronic origin and likely constitute an autonomous development of a sociophonetic nature. Grouping them with either voiced or voiceless realizations would obscure this distinction and prevent us from analyzing the factors that specifically favor this emerging variant.

We therefore propose to adopt a multinomial logistic mixed model that treats the three categories (voiceless/voiced/intermediate) as non-ordinal and equivalent outcomes. This approach maintains the advantages of mixed-effects modeling in handling non-independent observations while not assuming a privileged baseline, allowing us to identify specific predictors for each realization and respecting the categorical and non-hierarchical nature of our phonetic data.

 

Comment 2

ANOVA. Another issue raised in the review concerns ANOVA. The presentation was not fully effective. Neither v1 nor v2 clearly specifies whether the ANOVA is between-participants or within-participants (repeated-measures). Given the repeated-measures design, a repeated-measures ANOVA would be required if that framework is retained. However, a more appropriate approach is to model duration as a continuous dependent variable with Linear Mixed Effects Models, which better accommodate crossed random effects and unbalanced data.

Answer to comment 2

The reviewer is completely right on this point: we haven’t specified in the revised version that we have used repeated-measures ANOVA. However, also following the reviewer’s suggestions, we decided to leave but a mention to this preliminary result, and to completely focus on LME. Indeed, we have added LME to specifically check on the role of frequency (normalized with Zipfs formula, see also below) in shaping durational effects of dental affricates across voicing degrees.

 

Comment 3

Frequency. The authors’ reply letter suggests that, in v2, frequency was modeled as a continuous predictor, perhaps following feedback from another reviewer. For clarity, this was not the thrust of my original comment: the core point concerned the need for a mixed-effects framework to handle non-independent, repeated-measures data (multiple observations per speaker) when modeling voicing outcomes (voiceless/voiced/intermediate). Regardless of whether frequency is included, the backbone should be GLMMs (binomial or multinomial) with random intercepts for speaker and item (and slopes wehn appropriate) Given the amount of data, however, a sufficient piece of evidence could also be obtained by treating frequency in terms of ordinal levels, for instance, by splitting the data into tertiles or quartiles.

Answer to comment 3

We appreciate the reviewer's emphasis on using GLMMs as the analytical backbone. However, we respectfully maintain that the preliminary descriptive analyses are a necessary prerequisite to the mixed-effects models rather than a replacement for them. Before fitting GLMMs to test the specific effect of lexical frequency on affricate voicing, we need to verify that the distribution of voicing realizations in our corpus aligns with diachronic expectations based on etymological origin. This preliminary step serves two critical purposes. First, it allows us to identify potential sociolinguistic confounds that could obscure or distort the linguistic effect we aim to investigate. Second, it provides the necessary context for interpreting the GLMM results: without establishing the baseline distribution patterns, we risk either over-interpreting or under-interpreting the effects that emerge from the mixed-effects analysis. In other words, the descriptive analysis does not compete with the GLMMs but rather ensures that the patterns revealed by the mixed-effects models can be properly understood and attributed to the theoretical variables of interest. We have strengthened the mixed-effects component as suggested, but the preliminary analysis remains essential for the validity and interpretability of our findings.

 

Comment 4

Frequency Predictor Validity: The authors must clearly report how frequency was calculated and entered into the model. Specifically, the data must be transformed (e.g., using a logarithmic or Zipf scale) to meet the model's distributional assumptions, or, at a minimum, be recoded as a robust ordinal variable to ensure sound results.

Answer to comment 4

We checked for the number of 0-frequency items in our corpus to be sure on whether it was better to apply a logarithmic or a Zipf scale transformation. The 0-frequency items were only 8.5%, and we decided to apply the Zipf’s scale as it is most commonly used also in psycholinguistics experiments. In the methodological section we specified all the required information and we also reminded them in the analysis section the first time we use this variable.

 

Comment 5

Statistical Framework: The authors should consider adopting the Mixed-Effects Model framework for all analyses (both for multi/binomial variables and for duration). This is the standard practice for handling the dependent and non-independent nature of the data.

Answer to comment 5

We’ve tried our best to integrate the reviewers’ observation, despite having maintained a preliminary part of analysis, for the previously specified reasons.

 

Comment 6

The identified issues are correctable statistical challenges that, once addressed, will unlock the full potential of this research. The data and the theoretical framework are on point; a final, rigorous step on the quantitative treatment will make this a relevant contribution to the field. I look forward to recommending acceptance of the next version.

Answer to comment 6

We are thankful for the positive comment and the encouragement received, other than the suggestions. We’d like to point out that we have also slightly modified the theoretical section and the discussion, following the other reviewer’s suggestions.

 

Back to TopTop