Phonetically Based Corpora for Anglicisms: A Tijuana–San Diego Contact Outcome
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis study details the compilation of a corpus of English loanwords in the Spanish of Tijuana, Mexico, gathered from data elicited from spontaneous interviews, with a goal of analyzing the production and categorization of monophthongal vowels in monosyllabic loanwords. The authors observe that within some words, vowel quality approximates that of typical Spanish vowels, while in other words, it approximates North American English vowels. Based on this observation, the authors propose to divide their corpus into two distinct corpora, which the authors present as their primary contribution: one containing loanwords with adapted vowels, and another containing loanwords with unadapted vowels. The authors argue that their dataset poses a challenge for theories of loanword adaptation, as the choice of whether to adapt the vowel in a loanword appears to vary by lexical item rather than by the speaker's proficiency in English.
The compilation and presentation of such a corpus is a potentially valuable contribution to the literature, which offers only a few studies on the production of English vowels by Spanish speakers. However, there are some limitations with the study's writing, analysis, presentation of data, and engagement with previous literature that should be addressed before publication.
In particular:
- The main analysis is based on the authors' categorization of vowels, but the methodology used to do this is unclear, the presentation of descriptive statistics is incomplete or unclear, and inferential statistics are not possible with the study's small token count. The authors say they categorized vowels by taking average English and Spanish formant values from previous studies and determining which category each token vowel "approximated", but they do not appear to provide the target values for these categories (perhaps these are the "theoretical vowels" in Figure 2, but that is not clear), nor do they clearly specify what measurement they used to determine what category each vowel "approximated", nor do they acknowledge that formant values for a vowel category in practice are not point values, but statistical distributions that often overlap, problematizing the decision to categorize a vowel based on a single measurement. A more thorough presentation and discussion of formant value measurements could do a lot to improve this paper's contribution to the field. Currently, average formant values are presented in Figure 2 and individual formant values are presented in Table A2, but these are hard to read and do not provide information about within-category variation, which would be of interest to the reader and important to acknowledge when categorizing vowels. I also question the English vowel categories assumed, since the dialects these speakers likely have most contact with would be California English and Chicano English, which do not have necessarily have all of these categories: many speakers of California English, for example, do not contrast [É‘] and [É”].
- The theoretical framework with which the authors engage does not appear to be the most applicable to their participants or dataset. The authors characterize previous literature as making binary predictions on the outcome of loanword adaptation based on whether or not the speaker is bilingual in the source language. Such a characterization is reductive and not helpful for their dataset, which includes speakers with a wide range of degrees of bilingualism. It could be more fruitful to engage with literature that discusses bilingualism as a gradient phenomenon, that discusses interlanguage phonology, and/or that makes specific predictions about the (re)categorization of loanword and second language sounds, such as Best's Perceptual Assimilation Model, Flege's Speech Learning Model, or similar.
- The authors' decision to limit their dataset to monosyllabic words is not fully justified, and expanding their dataset to polysyllabic words (perhaps only analyzing stressed vowels) could help expand their small token count without necessitating the collection of more data. Moreover, the authors do not consider the phonetic context in which each vowel occurs, which is problematic given that phonetic context is known to affect vowel realization in both English and Spanish.
- Some of the tables and figures are of limited value. For example, the spectrogram in Figure 1 does not provide a frequency scale which could help the reader evaluate its accuracy. (For that matter, the authors provide no information on how vowel boundaries were determined.) Table 2 would be more useful if it included information on the vowel's formant values. (It also seems to imply that each target word was consistently produced with a vowel in the same category, but it's unclear if that was true.) Tables 3 and A1 do not appear to provide useful information, but take up a lot of space. (That is, the phrasal context of a given loanword and the speakers' demographics and English proficiency can be useful information if presented in relation to the data in question -- the vowel's formant values and/or categorization -- but is not intrinsically meaningful in isolation.)
- Since many of these speakers are bilingual, it's possible that they are code-switching into English for a single word rather than borrowing the word into Spanish, which could explain their use of English-like formant values for specific words. Is there a way to distinguish a loanword from a single-word code-switch? The paper could benefit from considering this point.
- It is perhaps outside the scope of the current paper, but a follow-up elicited speech experiment that had participants read out loud Spanish sentences with these loanwords could be a good way of gathering enough data to build inferential statistical models to confirm the categories proposed in this paper.
The quality of writing could be improved. There are numerous terms and idiomatic expressions that appear to be calqued from Spanish, resulting in an English prose that is unidiomatic and at times confusing. Additionally, there are multiple instances where further clarification could be provided about the study's assumptions and methodologies.
Author Response
Thank you so much for your valuable comments.
Kind regards,
Authors
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsGeneral comments:
The authors seek to study the presence of Anglicisms in the spoken Spanish of Spanish/English bilinguals in Tijuana, Mexico. This is in and of itself an interesting and useful project. However, the manuscript in its current form misses what is truly useful about the project at hand, taking the conversation into some confusing and unnecessary directions. For example, if this were a paper about creating corpora, then the literature on corpus linguistics should be consulted, and the conversation should cohere around creating corpora, making corpora available for analysis, etc. But really the authors have created word lists based on linguistic research. This is *still valuable* but should be reframed as answering a different question, namely a descriptive one: “what kinds of Anglicisms are observable in the speech of Tijuana bilinguals” and “are these borrowings phonologically adapted or not?” Then the discussion coheres around what’s going on in border communities (this is an important topic that’s barely mentioned) and the engagement with the linguistics literature can be through the literature on language contact and lexical borrowing and specifically engage with the many studies that look at the ways in which English speakers and Spanish speakers are borrowing each other’s words. The vast literature on language in U.S. Latino/a communities – from Los Angeles, to NYC, to Miami, to Texas and beyond, should provide useful and interesting engagement, given the high rates of bilingualism in Tijuana. The theory invoked by the authors in the current version of the manuscript doesn’t seem appropriate or productive for the data at hand. I also don’t understand why the authors conclude that this is an interdisciplinary study – how? There are a lot of buzz words – with every respect – in this paper that seem to distract from what is really useful and interesting, once again, namely, the language contact situation in Tijuana with an emphasis on the lexicon and the linguistic issues that stem from that. I recommend a major rewrite to streamline this paper, focusing specifically on: 1) place / sociolinguistic description, 2) language contact, 3) Spanish/English bilingualism in Mexico and U.S., 4) studies of lexical borrowings in both directions (how do Spanish and English influence each other lexically), 5) methods and data, 6) linguistic implications (phonological integration, etc.). That will make an interesting and publishable paper.
Typo: Silva-Corvalán (not: Corbalán)
In the introductory paragraph, the authors state that the border conditions promote “using predominantly Anglicisms in Tijuana.” This is confusing. Predominantly means “primarily.” Surely there are not more Anglicisms than there are Spanish-origin terms. Maybe instead: “These conditions promote the extensive use of Anglicisms in the Spanish of Tijuana.”
Typo: some occur (not some occurs)
Typo: acoustically analyzes
Typo: emerge -I think you mean “merge”
Typo: line 79 - bilinguals’
Typo: line 84 - influence pronunciation (not ‘influence on’)
Line 88 - the same
Line 106 - a basis
Line 107 - of the English language OR of English
Line 110 - each monosyllabic Anglicism (singular)
I’m not sure that the authors used social networks as a method for participant recruitment as discussed in Milroy. It sounds like they used snowball sampling, or friend of a friend sampling, which is not the same thing. I would remove the reference to social networks.
157 - “in order to participate, the participants were required to have…”
“ These specific criteria would represent a sample of the contemporary sociolinguistic landscape in Tijuana.” -this is a subsample of Tijuana, not a reflection of the “contemporary sociolinguistic landscape.” The authors should explain why these criteria were chosen and not others.
“Prior to making audio recordings with participants”
Why did you exclude expressions like “very nice?”
What are the underlying English forms in baika, etc.?
Why were words like “fake” excluded?
“We focused on analyzing vowels due to their greater acoustic dynamism, influenced by coar- ticulatory effects, spectral shifts and speaker variation.” ïƒ how did you do these things? Was this done instrumentally?
I don’t understand what is meant by “theoretical vowels.” Are the vowels presented not literally based on acoustic measurements?
Comments on the Quality of English Language
Lots of revision for clarity is needed
Author Response
Thank you so much for your valuable comments.
Kind regards,
Authors
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsPlease see the attached document.
Comments for author File: Comments.pdf
Author Response
Thank you so much for your valuable comments and time.
Kind regards,
Authors
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsSeveral aspects of the paper have been substantively improved since the last version, most notably the use and explicit discussion of Euclidean distance measurements to classify vowels, explanations of methodology and results, and discussion of background literature.
I still question the inclusion of both [É‘] and [É”] as comparison vowels, since these vowels tend to be merged in Southern California -- of the two sources cited for California English vowel formant values, Aiello (2010) acknowledges this and Hagiwara (1997) does not include [É”] as a separate category.
The added justification for focusing on monosyllables is welcome, but discussion of phonetic context (particularly preceding and following consonants) is still lacking, as, again, phonetic context has been found to affect vowel realization in both languages.
Additional comments:
- Abstract, Line 13 and throughout. R should be cited rather than RStudio for any statistical methods, as R is the actual statistical software, and RStudio is merely an interface for working in R.
- Materials and Methods, Line 156. The description of the data elicitation method is still a little vague. Were these sociolinguistic interviews?
- Materials and Methods, Line 178. I agree with the decision to exclude phonemic diphthongs like "host" but note that it creates a bit of an imbalance, as for example later there's a discussion of words realized with a Spanish [o] and it would be interesting to see if words like "host" would also be produced with a Spanish [o].
- Materials and Methods, Line 180. "ea" in English "clean" and "oo" in "loop" are probably best described as digraphs rather than "orthographic diphthongs".
- Materials and Methods, Line 190. "English vowels often show greater temporal variability due to the absence of a secondary anchoring vowel" Would this be an argument against focusing on English monosyllables?
- Materials and Methods, Line 263. The description of formant measurements mentions taking average formant values for the vowel. Is this the average formant value over the entire duration of the vowel, or some segment of the vowel? What criteria were used to determine vowel boundaries?
- Materials and Methods, Line 274. "We set a 50 Hz threshold as a criterion for perceptual distinctiveness". Why 50 Hz? Was this an arbitrary cutoff or is it based in previous literature?
- Materials and Methods, Line 284. R should be cited rather than RStudio. The name of the R package is phonR, not PhoneR (there's no "e").
- Results, Figure 3. This figure is difficult to read. What is this measuring distance from?
- Results, Line 338. "gym", "tip", "trip" (which in English have the /ɪ/ of "kit") are classified together with "bleach", "cheers", "weed", and "weird" (which in English have the vowel /i/ of "fleece"). Given that these are different phonemes in English, it seems odd to group these together. If /ɪ/ (and /ʊ/) were not considered separately from /i/ (and /u/), why not?
- Results, Figure 4. The added description helps make more sense of this figure, but it's still quite hard to read with all the overlapping symbols. The description refers to 'i.u', but I think is supposed to say 'u.e'.
Comments on the Quality of English Language
The writing is improved, but there are are few places in which it is confusing or more explanation would be warranted.
Author Response
Once again, thank you very much for all your valuable comments.
Kind regards,
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authorssee the attached file
Comments for author File: Comments.pdf
Author Response
Once again, thank you very much for all your valuable comments.
Kind regards,
Author Response File: Author Response.pdf
Round 3
Reviewer 3 Report
Comments and Suggestions for AuthorsI appreciate the time that the author(s) have taken to address my comments. I think that this manuscript has been significantly improved from the original submission. At this stage, I find they have sufficiently addressed my original issues.