Phonetic Diversity vs. Sociolinguistic and Phonological Patterning of R in Québec French
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe main question in this article is of course interesting, but the way it was written makes it extremely hard to follow. There’s a main methodological flaw that can easily be fixed (although it will take quite a lot of re-writing): the authors make no explicit predictions. And therefore, when they compute their models, they only arbitrarily (and briefly) discuss some of the effects, leaving out other effects without any clear reason. Their models include interaction terms: why include them if you don’t discuss them or if you have no particular predictions about interactions? The link between predictions and results should be more systematic. I think the authors should rewrite the paper so that clear explicit predictions are made before the analysis. Then the analysis should explain which models will be fitted to the data and why. And then, a “Results” section should echo the predictions and report statistics in a much more systematic way. As it is now, the article will have little impact because it doesn’t match current requirements in terms of scientific reasoning despite its obvious scientific potential. The results section actually includes bits that should be in an “Analysis” or “Methods” section, and bits that definitely should be in the Discussion. As it is, the article looks like a linear succession of findings that trigger more analyses and more findings… In a word, the overall organization of the paper should be completely changed.
My second main concern is that the data was “coded” by a single phonetician. It is therefore impossible to evaluate the reliability and replicability of the data. With, say, 3 transcribers, we would have an fair idea of the robustness of the coding. Here it seems that we just have to trust someone’s ears (and footnote number 3 does not really reassure me). Of course, spectrograms were used – which is a good thing – but for some aspects (like the alveolar vs uvular trill), the readers have to trust a single person’s perception… In the same way: can someone really make an accurate distinction between a retroflex and an alveolar approximant? The article will have much more impact if the coding were performed by more than just one transcriber.
More minor points:
The sentence “A study on Catalan … depend on syllabic position” could be split into at least 2 sentences
“but also in dorsopalatal contact”: please clarify: do you mean that place of articulation is different? Or the contact area between tongue and palate?
“sometimes also as an approximant [ɹ]”: could you please double-check that the reference you used to support that a dental/alveolar approximant was found in Metropolitan French actually says this.
“"Institut de la statistique du Québec" about Francophonie (2016). ” : please use a proper reference.
P 3 : you wrote both “protocol” and “protocole”
“an average age ranging between 36 and 45 years”; I don’t understand: averaged across which variable? Location?
Why show us 32 locations in Figure 2 when you actually used data from only 29 of them? The other 3 locations are uninformative here.
Similarly: why did you describe the whole PFC protocol when you actually focus on the list readings only? I don’t think it is useful.
Figure 3: in addition to the bad quality of the picture, pie charts are hardly ever used in scientific journals because proportions represented as areas are hard to process. A bar chart would be more standard.
What are a TR or a CR cluster? TR is defined in a footnote but only after the word has been introduced… please define these sequences earlier.
“Since the retroflex variant appears in only 21 tokens in a reduced number of word-forms, we exclude it from the analyses”: it would be nice to know in the Discussion what these 21 tokens were.
Regarding the model: part of the structure of the data cannot be accounted for since, as the authors remark, there were reading mistakes. Technically Word is nested within PhonologicalContext. I’d be curious to know what happens if you add this as a random effect.
“Each variable has a significant effect on the ratio of tokens per type as well as their interactions (p<0.01).”: I don’t understand what this means.
“To preserve the readability and clarity of our remarks, we focus on the simple effect of each independent variable.”: I can understand, but then why specify interactions in the model?
Figure 4: perhaps keeping just the curves with different colors and leaving the areas under the curves transparent would be more readable. This would save space since Figures 5 and 6 would become redundant (unless I missed something).
“Our statistical models shows that birth year indeed has a significant effect”: this should come before Figure 4. It is much more common in scientific journals to first produce the main result and then illustrate it by means of a plot.
I’m not sure how I should interpret Figure 7. Does it mean that older speakers only use apical variants? Probably not. So does the graph show the preferred variant for each year? For example, people born around 1972 have voiced fricatives and people born in 1975 have voiceless fricatives? Sorry, I’m not used to seeing such plots. Maybe the authors should explain its meaning more clearly.
“Gender is an interesting factor to investigate, mainly regarding the use of lenified allophones of /R/ like approximants or vocalized /R/.” Could you cite a reference here, or if you mentioned this in the introduction, please make it clearer. Because here, the statement seems to come out of nowhere. Again, as I said at the beginning of the review: the structure of the paper is very unscientific.
And also : you should start by reporting statistics: I don’t know what output the package you used provided, but we should have at least a p value and some measure of the extent to which gender differences predict Coding (for instance an R squared or something of that kind). The authors chose to report statistics and p values after showing plots of the data; this is unusual.
“the IV Birth year,” I suppose “IV” stands for “independent variable”… you didn’t say it explicitly. Plus “Birth year” as a variable name, was called “BirthYear” in another model… please be consistent.
Table 2: p values are not very informative here. Of course, you have a big dataset so p values are bound to be quite low… now some real measure of the predictive power of each variable would be more than welcome. I mentioned R squared because it’s quite intuitive but there may exist more appropriate metrics to quantify the impact of a variable. And of course low p values have been historically represented by 3 asterisks: what’s the point of the last column? As it stands, Table 2 says very little.
“the sociolinguistic characteristics of our speakers have an effect on the Group to which they belong”: the “characteristics” in question are Zone, Birth year and Gender… these are not sociolinguistic features… these are just social features, or individual features.
“Data spreadsheets as well as every scripts used in this paper are available from the authors on demand”: should be “every script”. On a side note, I’m just being curious: why is it that the raw recordings from of a project whose sole aim is to collect comparable speech corpora are not available to the academic community?
“phonological unity: Languages” should be “phonological unity: languages”
I don’t think “big data” as a keyword is informative or useful at all for this paper. Yes, the dataset is quite big, but compared to the size of the training datasets for current models of e.g. speech recognition (e.g. The CommonVoice English corpus has almost 100 000 speakers), the dataset here is rather small.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis is an original and interesting paper on rhotics in QF. I appreciate and enjoy this work a lot. It is an exceptional and high quality data set which is rarely seen, nor in the study of rhotic, nor in opt LVC research.
I have some small suggestions (to increase readability) and two (related) points I would like to invite the authors to reflect on, as I am very much interested in understanding rhotic variation.
Small suggestions
- For Belgian French: see the work of Didier Demolin.
- 164: typo in legend Fig. 4 (approximmnt)
- density plot: explain in the text this type of plot and how it has to be interpreted
- figures 5 and 6: delete the variants/colour codes from the variants that are not part of these figures
- figure 7: you have to explain HOW this plot summarises the posthoc tests. How do you arrive at a specific year on the Y-axis? Connecting with a line is not the best choice, as the x-axis is not a scale, but nominal variables (histograms/bars might be more adequate).
Two or three speaker types?
In lines 211 and following you define three speaker types: Apical, uvular and fricative speakers. In this order. When I read this, the 5% cut off point seemed low and arbitrary. In the results section the order is fricative, uvular and apical. And that's an order that makes more sense / seems more coherent. And it probably reflects more what happened during the analysis: there was a need to distinguish speaker types, and separate speakers with trills from those not having trills (and distinguishing in this group the alveolar and uvular speakers). At first sight (but I am not embedded in the speech community and I lack the "sociolinguistic feeling" which is crucial in this type of research), I would distinguish TWO speaker types: front (apical) and back (uvular speakers). If I remember correctly, this distinction has been made before by Blondeau & Sankoff. What are your arguments for opting for three types? Did you consider these two types?
Mixed speakers
This point is related to the previous one. At least some of your apical speakers are from the point of view of place of articulation mixed speakers, using both uvular (fricatives/approximants) and alveolar taps/trills. Interestingly they never combine alveolar trills and uvular trills. So, how do they move from alveolar trills to uvular fricatives in a weakening process in coda position? In our work on Dutch (Van de Velde, Tops & Van Hout 2013:235 The number of speakers who mix both variant types is 102 (5.3%), of whom only 38 (2.0%) have a mix of 20% or more of one variant and 80% or less of the other one , this is a dataset of 1912 speakers, 12 tokens per speaker; Verstraeten & Van de Velde 2001, 9 of 160 speakers of Standard Dutch), we observed that mixed alveolar and uvular speakers are quite rare, and that most of them only occasionally shift to the other place of articulation (causes like normative pressure, some of the speakers had to follow speech therapy to get rid of a uvular [R], some switch between two varieties of Dutch (e.g. a local dialect with R and the standard with r)). So, what's your explanation for this mixing in QF, in the light of the work by Sebregts on the relationship between Dutch r-variants. You might also have a look at Spreafico and Vietti (2013).
Feel free to contact me if you want to discuss these issues
Hans Van de Velde
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe second version of this paper displays a better structure than the original submission. The overall readability has been improved. At times, it’s still a little hard to follow. This is likely due to the nature of the paper: it seeks to be as comprehensive as possible… and sometimes there is just too much data and it’s still a little hard to extract the take-home message.
Perhaps a couple of words on the various potential shortcomings of the study (at the end of the paper) would be welcome: single annotator, reading list, etc.
In the new version, the predictions have been made more explicit.
The grouping of speakers into three categories remains a little paradoxical: so called “categorical” speakers were removed from the analysis (BTW, please say how many speakers were “categorical”, perhaps in Table 1) because they were “categorical”, and then, you essentially make the remaining speakers categorical by splitting them into three groups. Of course, I can understand why you did this. But the reason why you removed “categorical” speakers in the first place remains unclear: your justification is that you want to observe intra-, as well as inter-speaker variation. Well, “categorical” speakers are part of the inter-speaker dimension so perhaps they should be included, i.e. those who were 100% apical could be assigned to the apical group for the models.
The justification for why a single phonetician coded the dataset is now stronger. In particular, the authors cite previous literature by distinguished researchers in prestigious journals. However the first part of the justification in foot note 8 is a little surprising/scary: if inter annotator agreement is as low as 49%, then I suppose it means that if someone else had coded the data for the present paper, very different results would have been obtained. So, the extent to which the current coding is reliable is unknown. This is why I was a little concerned about the reliability and replicability of the results in the paper. Being a phonetician myself, I know that this is a recurring problem in our field, and of course you hardly ever get enough funding to have multiple annotators for so much data. Yet, I still think one single annotator is suboptimal and it should be mentioned as a weakness in the conclusion.
“since /TR/ clusters famously give rise to /R/ deletion in French”: yes they sometimes do, but you make it sound as if it was automatic. In France French, this type of /R/ deletion also carries social meaning so some speakers will avoid this “natural” tendency and make every effort to maintain the /R/ (optionally adding a schwa after the cluster).
A quick word on colons: yes, the word following a colon often starts with a capital letter in English contrary to other languages: but this is not systematic (far from it! It mainly applies to whole sentences after colons in US English, according to the Chicago Manual of Style and the Oxford style manual).
“this pronunciation shift begun” -> “began”
“On the other hand” (p4 line 102): where’s the “one hand”?
“Tranel [54],Webb [56]” -> “Tranel [54] and Webb [56]”
The various shades of green in some figures are not very readable (eg: Figure 8). In Figures like Fig. 5, figures overlap, and some of them have been cropped. Please improve the readability of the figures.
Regarding Figure 8. It would be interesting if you could pick maybe 1 or 2 interesting interactions and describe how these interactions as shown in Figure 8 should be interpreted.
Note 14: “behing” -> “behind”
The formula for the first model is a little unclear: why not call the dependent variable “r_type” or something that is more intuitive. Here “coding” is unclear.
“two-by-two” (several occurrences): I personally say “pairwise” and I’m quite certain that I’ve always heard “pairwise” in an English-speaking context… but I may be wrong.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf