Review Reports - Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

GENERAL COMMENTS
This is an interesting and well executed paper with some important empirical results, and I think it could be published with relatively minor revisions, although if the authors find productive ways of following up on my suggestions relating to l. 523-525 there could be more experiments and things to report. In some of my other comments I encourage comparisons with classifications in the literature using measures similar to the lexico-phonetic measure of the authors and typological measures that may or may not share some aspects with the syntactic measure of the authors. I am not trying to push the authors into large digressions, but at least a pragraph could be added discussing the results in relation to those of other authors (and not just Heeringa et al. 2023).

LINE-BY-LINE COMMENTS
l. 60-62: "there are no published data on the linguistic distance within and between the languages and language families at issue that would enable a direct evaluation of the claims". There is a dataset, the ASJP database (Wichmann et al. 2022), and associated methods (Wichmann et al. 2010) that allows for the computation of linguistic distances among some 4/5 of the world's languages. The approach is not even very different from the one of the authors, except that it uses standard word lists representing a subset of the Swadesh list. The authors should at least show awareness of these resources.
l. 101-102 "When the same concept is expressed in two languages by non-cognate words, any phonological similarity between these words will be accidental". This is why Wichmann et al. (2010) preferred the LDND, where the distance of word pairs referring to the same concept is normalized by the averages distances among words referring to different concepts. So it is not necessary to make sure that all word pairs are cognate to get meaningful results (but the problem mainly arises in comparisons among unrelated or distantly related languages).
l. 114 "as in Longobardi et al., 2005; Dunn et al., 2011; Ceolin et al., 2020, 2021". There are other papers that can be cited to better fill in the history of this approach: Dunn et al. (2005) [this is more relevant than Dunn et al. 2011], Wichmann and Saunders (2007), Polyakov et al. (2009), Gray et al. (2010)
l. 120-122: "The linguistic distance measurements discussed in this paper are based on materials originally collected to assess mutual intelligibility among closely related European languages (Golubović, 2016; Gooskens & Van Heuven, 2017, 2020; Gooskens et al., 2018; Swarte, 2016)" In the meantime there is also Gooskens (2024). This may or may not be relevant to cite here. It probably is relevant to cite somewhere in the paper in any case.
l. 150-164. According to this description it seems like there should be many cases where some meaning will correspond to words belonging to different cognate classes across the languages of a family. But from the previous text I was under the impression that only cognates are used. Perhaps clarify this here or say that more detail about this aspect comes later (if that's the case).
l. 191-194. Ok, here it is said that non-cognates are also compared. But then there will be the issue of accidental similarities which was solved by the method of Wichmann et al. (2010), but which is ignored here. A bit of discussion somewhere of why this problem is just ignored would be in place. Or, alternatively, something like the LDND could be implemented.
l. 195-: Some words about why weighting taking into account similarity among segments could be added with reference to Heeringa's dissertation or similar.
l. 294: "the 15 x 15 Chinese dialects". Here and elsewhere no reason to call them "dialects" in the Chinese tradition. Instead use something neutral like "language variety".
l. 303: "between pairs of varieties Appendices C-D" -> "between pairs of varieties in Appendices C-D" (?)
l. 308: "We will first present the lexico-syntactic distances" -> "We will first present the lexico-phonetic distances" (?)
Fig. 1 and discussion of it: This could be compared with other published trees using similar methods. There is an ASJP tree in Polykov et al. (2009) Fig. 6 with similar relations for the intersection of European languages in both trees and one can also (perhaps more appropriately) compare with the fuller tree of Müller et al. (2021) that contains all the European languages also in the present paper. It also has many Sinitic varieties.
Fig. 2: Probably appropriate to report on the stress value of the MDS result, which will indicate how meaningful the plot is.
Fig. 3: Interesting that the 3 IE groups are distinguished, even if the internal relations are not as in a normal classification. Typologically-based classifications (check the literature cited on l. 114 plus the ones I suggested to add) typically are not good at distinguishing major subgroups of IE. For instance, in the tree of Polykov et al. (2009: Fig. 7) Germanic and Romance languages are mixed (while Slavic languages are kept together). Some discussion could be added somewhere doing some comparisons with classifications based on typological features to gauge wether the trigram syntactic approach picks up some of the same signals of language contact.
l. 511-512: "Both lexico-phonetic and syntactic distances are larger across language families than within families". Well, not always, because Sinitic is closer to Slavic than Germanic-Romance is in the syntactic tree in Fig. 3.
l. 513-514: "the distances within families tend to reflect the traditional cladistic genealogy of the languages as proposed in the linguistic literature." This is of course a rather loose statement, so it cannot be said to be wrong, but there are several important differences, and the differences are more interesting than the similarities. I would like to see more discussion of the latter. Are the differences somehow expected because of language contact or other non-genealogical effects?
l. 523-525: "Syntactically, however, the differences among the European languages, both within and across the families, are about ten times larger than the differences within and across the groups of Sinitic varieties." This is a bit weird. I wonder wether there is something about the structures of IE vs. Sinitic that makes a PoS trigram comparison likely to exhibit more variation within IE than within Sinitic? It could be that there is a higher degree of similarity been two different texts in one and the same Sinitic language than between two different texts in one and the same European language, for whatever structural reason (something to do with PoS tag diversity?). If so, different Sinitic languages would also be more similar to one another just for that reason and then, ideally, similarities should be "punished" (normalized) for that. A bit of experimentation along these lines might at least help to clarify where the we are really comparing apples and apples and not apples and oranges here.

References
Dunn, Michael, Angela Terrill, Ger Reesink, Robert A. Foley, and Stephen C. Levinson. 2005. Structural phylogenetics and the reconstruction of ancient language history. Science 309: 2072-2075.
Gooskens, Charlotte. 2024. Mutual Intelligibility between Closely Related Languages. Berlin/Boston: De Gruyter Mouton.
Gray, Russell D., David Bryant, and Simon Greenhill. 2010. On the shape and fabric of human history. Philosophical Transactions of the Royal Society B 365: 3923–3933.
Müller, André, Viveka Velupillai, Søren Wichmann, Cecil H. Brown, Eric W. Holman, Sebastian Sauppe, Pamela Brown, Harald Hammarström, Oleg Belyaev, Johann-Mattis List, Dik Bakker, Dmitri Egorov, Matthias Urban, Robert Mailhammer, Matthew S. Dryer, Evgenia Korovina, David Beck, Helen Geyer, Pattie Epps, Anthony Grant, and Pilar Valenzuela. 2021. ASJP World Language Trees of Lexical Similarity: Version 5 (October 2021). https://asjp.clld.org/download
Polyakov, Vladimir N., Valery D. Solovyev, Søren Wichmann, and Oleg Belyaev. 2009. Using WALS and Jazyki Mira. Linguistic Typology 13: 135-165.
Wichmann, Søren, Eric W. Holman, and Cecil H. Brown (eds.). 2022. The ASJP Database (version 20). http://asjp.clld.org/.
Wichmann, Søren, Eric W. Holman, Dik Bakker, and Cecil H. Brown. 2010. Evaluating linguistic distance measures. Physica A. 389: 3632-3639; doi:10.1016/j.physa.2010.05.011
Wichmann, Søren and Arpiar Saunders. 2007. How to use typological databases in historical linguistic research. Diachronica 24.2: 373-404.

Author Response

Comments with a green background mean: we agree, and comments with a yellow background mean: we do not agree.

Reviewer 1

In the revision we acknowledge the existence of the ASJP, and provide a summary of a comparison of the method used in ASJP and ours. A more comprehensive comparison will be made available in the supplementary materials.

101-102 "When the same concept is expressed in two languages by non-cognate words, any phonological similarity between these words will be accidental". This is why Wichmann et al. (2010) preferred the LDND, where the distance of word pairs referring to the same concept is normalized by the averages distances among words referring to different concepts. So it is not necessary to make sure that all word pairs are cognate to get meaningful results (but the problem mainly arises in comparisons among unrelated or distantly related languages).

We have added a note in which we explain the precaution made by Wichmann et al. (2010). However, since the LDND does not predict cladistic language trees significantly better than the regular LDN, we made no effort to change our method of computing the phonetic distance between the language pairs in our sample.

114 "as in Longobardi et al., 2005; Dunn et al., 2011; Ceolin et al., 2020, 2021". There are other papers that can be cited to better fill in the history of this approach: Dunn et al. (2005) [this is more relevant than Dunn et al. 2011], Wichmann and Saunders (2007), Polyakov et al. (2009), Gray et al. (2010)
These suggestions have been included in the revision.

120-122: "The linguistic distance measurements discussed in this paper are based on materials originally collected to assess mutual intelligibility among closely related European languages (Golubović, 2016; Gooskens & Van Heuven, 2017, 2020; Gooskens et al., 2018; Swarte, 2016)" In the meantime there is also Gooskens (2024). This may or may not be relevant to cite here. It probably is relevant to cite somewhere in the paper in any case. We now also cite Gooskens (2024).
150-164. According to this description it seems like there should be many cases where some meaning will correspond to words belonging to different cognate classes across the languages of a family. But from the previous text I was under the impression that only cognates are used. Perhaps clarify this here or say that more detail about this aspect comes later (if that's the case).

We do not see how the text leads to a cognates-only expectation. We think it is best not to interrupt the flow of the text by pointing ahead to the all-word distance solution

191-194. Ok, here it is said that non-cognates are also compared. But then there will be the issue of accidental similarities which was solved by the method of Wichmann et al. (2010), but which is ignored here. A bit of discussion somewhere of why this problem is just ignored would be in place. Or, alternatively, something like the LDND could be implemented.

The alterative proposed by Wichmann et al. is acknowledged in the revision. In the same passage, we indicate that the gain that might be obtained by using the more complicated version of the LD would be minimal.

195-: Some words about why weighting taking into account similarity among segments could be added with reference to Heeringa's dissertation or similar.

The original version provides the following motivation for PMI weighting:

“This version of Levenshtein distance learns the segment distances by analyzing the alignments (such as in Table 1) that underly the distance measurements. The basic idea is that substitutions of segments that frequently co-occur in an alignment slot are weighed less heavily than segments that rarely co-occur. These segment distances are used as operation weights, rather than the binary weights that we used in the example in Table 2. As a result, [i] and [ɒ], as an example of an unlikely substitution, will be more distant to each other than [i] and [ɪ].”

Referring to Wieling, we would argue that a data-driven weighting based on the actual occurrence of sound correspondences between a pair of languages should be a better (psychologically more realistic) estimate of the effort needed by a native listener to crack the code of the other language, better than a feature weighting based on general phonetic principles (as in Heeringa 2004). We added this consideration in the text of the revision.

294: "the 15 x 15 Chinese dialects". Here and elsewhere no reason to call them "dialects" in the Chinese tradition. Instead use something neutral like "language variety". OK. Done.
l. 303: "between pairs of varieties Appendices C-D" -> "between pairs of varieties in Appendices C-D" (?)
308: "We will first present the lexico-syntactic distances" -> "We will first present the lexico-phonetic distances" (?) This was an error. Corrected in the revision.
- Fig. 1 and discussion of it: This could be compared with other published trees using similar methods. There is an ASJP tree in Polykov et al. (2009) Fig. 6 with similar relations for the intersection of European languages in both trees and one can also (perhaps more appropriately) compare with the fuller tree of Müller et al. (2021) that contains all the European languages also in the present paper. It also has many Sinitic varieties.

Polyakov et al. (2009: figure 6), based on the ASJP, shares 3 Germanic, 3 Romance and 3 Slavic languages with our selection. Like ours, the three groups end up in three different branches of the affinity tree. Moreover, Danish and Swedish cluster at the lowest level in Germanic, then joined by Dutch. Portuguese and Italian cluster first, to be joined next by French in the Romance branch, while Czech and Polish cluster before being joined by Bulgarian in the Slavic group. This is also seen in our own trees. However, “borderline” languages such as English (Germanic but with heavy influence from Romance) and Romanian (Romance but strong Slavic influence on vocabulary) are not included in the ASJP tree. Computations done by ourselves using our PMI-weighted Levenshtein Distance measure and UPGMA clustering on the ASJP data on the web show a correlation with our own lexico-phonetic distances at r = .925.

Fri Fri

Du Du

Da Ge

Sw Sw

En Da

Ge En

Sp Sp

It It

Por Por

Ro Ro

Fre Fre

Cz Cz

Sk Sk

Cr Pol

Sn Cr

Pol Sn

Bu Bu

Müller et al. (2021) add German as the most distant language to a cluster of two pairs (Danish+Swedish and Dutch+Frisian), where our tree agrees with the more widely accepted view that English differs most from the continental Germanic varieties. The tree for the five Romance languages has the same add-on structure as in our results. For the six Slavic languages, we would argue that our results reflect the common division of varieties more closely than in Müller et al. We have a primary split in West Slavic and South Slavic, whereas Müller et al. have Bulgarian as a late add on to a hybrid cluster of the other five varieties.

We now briefly mention these comparisons, and then refer the reader to the supplementary material in which we will show the various trees side-by-side and comment on the differences.

One Mandarin and one non-Mandarin variety in our set is not covered by Müller et al. (2021), i.e., Mandarin Taiyuan and non-Mandarin Chaozhou (Wu). The lexico-phonetic affinity tree that can be extracted from pages 28 through 31 in Müller et al. looks like this (left half of page):

Jinan (N-Man) Jinan (N-Man)

Xi’an (N-Man) Xi’an (N-Man)

Beijing (N-Man) Beijing (N-Man

Chengdu (SW-Man) Chengdu (SW-Man)

Wuhan (SW-Man) Wuhan (SW-Man)

Changsha (Xiang) Changsha (Xiang)

Nanchang (Gan) Nanchang (Gan)

Wenzhou (Wu) Meixian (Hakka

Guangzhou (Yue) Guangzhou (Yue)

Suzhou (Wu) Wenzhou (Wu

Meixian (Hakka) Suzhou (Wu)

Fuzhou (Min) Fuzhou (Min)

Xiamen (Min) Xiamen (Min)

Müller et al. (2021) Present paper

Mandarin varieties in red print.

The structure of the Müller tree is not unlike ours but there are important differences. Our tree keeps the two Wu varieties (Suzhou, Wenzhou) together while these are scattered in the Müller tree: Wenzhou is added to the predominantly Mandarin cluster, and Suzhou is clustered with Meixian (Hakka). Also, our tree shows a less polluted Mandarin cluster with one non-Mandarin intruder (Changsha) against two intruders in the Müller tree (Changsha and Nanchang). The Müller tree is better in its internal structure of the Mandarin cluster: It keeps Beijing together with the other Northerns varieties, while it is parsed with the SW-group in our tree (although this may be due by the elimination of SW Taiyuan from our tree). On aggregate, our lexico-phonetic distances seem to reflect the traditional phylogeny (slightly) better than the Müller solution.

- Fig. 2: Probably appropriate to report on the stress value of the MDS result, which will indicate how meaningful the plot is. We added the stress information, as requested.
- Fig. 3: Interesting that the 3 IE groups are distinguished, even if the internal relations are not as in a normal classification. Typologically-based classifications (check the literature cited on l. 114 plus the ones I suggested to add) typically are not good at distinguishing major subgroups of IE. For instance, in the tree of Polykov et al. (2009: Fig. 7) Germanic and Romance languages are mixed (while Slavic languages are kept together). OK, Fig 7 is based on the JM typological data. The ASJP trees do not have these mismatches. This may well indicate that traditional language genealogies are primarily based on lexico-phonetic similarity rather than on typology.

Some discussion could be added somewhere doing some comparisons with classifications based on typological features to gauge whether the trigram syntactic approach picks up some of the same signals of language contact.

We do not understand the mention of sensitivity to language contact phenomena. There is no reason to assume that differences in word order arise specifically through language contact. Three typological syntactic differences spring to mind that could be traced in the trigram frequencies. One is the habit of Germanic languages to string nouns together in compounds (steam train), where Romance languages break such compounds down into a series of preposition phrases (train à vapeur). So we would expect Noun-Prep-Noun trigrams to be more frequent in Romance than in Germanic Languages. Similarly, Romance languages have the qualifying adjective following the noun (vin blanc) rather than preceding it as in Germanic languages (white wine). As a consequence, DET-NOUN-ADJ trigrams should be frequent in Romance while DET-ADJ-NOUN trigrams should abound in Germanic languages. Third, most Germanic languages allow constituents between auxiliaries/modals and the corresponding participle or infinitive. In Romance languages the verb group cannot be separated. Accordingly, X-VERB-VERB and VERB-VERB-X trigrams will be more frequent in Romance than in Germanic languages (with the exception of English, which keeps its verbs together).

The trigram method works well for the European languages we included; the syntactic distances correlate well with the lexical and phonetic distances computed for the same languages (Heeringa et al., 2023). We assume they also work well for the Chinese varieties. We now mention the possibility to trace differences in trigram frequencies back to specific typological features (as in the previous paragraph) but we will not engage in actually testing these predictions in the present paper. Validating the trigram method will be a topic for a future paper.

511-512: "Both lexico-phonetic and syntactic distances are larger across language families than within families". Well, not always, because Sinitic is closer to Slavic than Germanic-Romance is in the syntactic tree in Fig. 3. True, but in the MDS plot in Fig 4. Slavic is closer to the other European groups than to the Chinese group. We now mention the discrepancy between Figs 3 and 4 but maintain the original conclusion.
513-514: "the distances within families tend to reflect the traditional cladistic genealogy of the languages as proposed in the linguistic literature." This is of course a rather loose statement, so it cannot be said to be wrong, but there are several important differences, and the differences are more interesting than the similarities. I would like to see more discussion of the latter. Are the differences somehow expected because of language contact or other non-genealogical effects?

Speculating on the underlying reasons for discrepancies that remain between our distances and traditional cladistic genealogies is beyond the scope of the present article. Of course, language contact (especially relexification through borrowing, as in English after 1066) will disrupt the steady chaining of sound changes that is characteristic of language diversification starting with one common ancestor language. We decided not to enter into this discussion in the present paper.

523-525: "Syntactically, however, the differences among the European languages, both within and across the families, are about ten times larger than the differences within and across the groups of Sinitic varieties." This is a bit weird. I wonder wether there is something about the structures of IE vs. Sinitic that makes a PoS trigram comparison likely to exhibit more variation within IE than within Sinitic? It could be that there is a higher degree of similarity been two different texts in one and the same Sinitic language than between two different texts in one and the same European language, for whatever structural reason (something to do with PoS tag diversity?). If so, different Sinitic languages would also be more similar to one another just for that reason and then, ideally, similarities should be "punished" (normalized) for that. A bit of experimentation along these lines might at least help to clarify where the we are really comparing apples and apples and not apples and oranges here.

The Part of Speech categories we adopted are meant to be a language universal set. They should apply to Sinitic languages as adequately as to the European languages. Chinese dialects largely share the same basic word order (unlike IE languages). It has been observed that word-order differences mainly arise in question sentences. Our four texts do not contain any questions, which may indeed limit the possibilities for word-order difference. Differences among dialects may occur in the diversity of particles but not in the position of the particles. Moreover, Chinese languages make a rather limited use of the PoS category DET. Chinese has no articles (whether definite or indefinite), and substitutes the demonstrative pronoun only to solve pragmatic ambiguity. Again, these typological properties limit the playroom for diversity in PoS trigrams (using the ten categories in our Table 1). We agree that the same unusually high similarity in PoS trigram frequencies is expected in other Chinese varieties but we will have to delay checking this prediction to a future paper. At this time, we have no translations of our four texts into other Sinitic languages than the 15 that we have (which took a lot of work to obtain anyway).

To reassure the Reviewer, we bring up this issue in the discussion section. We refer to an authoritative review of the comparative literature on word-order differences across Sinitic dialects by Zang (2003). The thrust of this review is that indeed the differences in word order among Chinese dialects are generally small, and basically all follow the conglomerate of syntactic features that language typologists posit as characteristic of SVO languages.

Zhang, Zhenxing (2003). 现代汉语方言语序问题的考察 [An investigation of word order across Chinese dialects]. Fāngyán, 25(2), 108–126.

References
Dunn, Michael, Angela Terrill, Ger Re esink, Robert A. Foley, and Stephen C. Levinson. 2005. Structural phylogenetics and the reconstruction of ancient language history. Science 309: 2072-2075.

Gooskens, Charlotte. 2024. Mutual Intelligibility between Closely Related Languages. Berlin/Boston: De Gruyter Mouton.

Gray, Russell D., David Bryant, and Simon Greenhill. 2010. On the shape and fabric of human history. Philosophical Transactions of the Royal Society B 365: 3923–3933.

Müller, André, Viveka Velupillai, Søren Wichmann, Cecil H. Brown, Eric W. Holman, Sebastian Sauppe, Pamela Brown, Harald Hammarström, Oleg Belyaev, Johann-Mattis List, Dik Bakker, Dmitri Egorov, Matthias Urban, Robert Mailhammer, Matthew S. Dryer, Evgenia Korovina, David Beck, Helen Geyer, Pattie Epps, Anthony Grant, and Pilar Valenzuela. 2021. ASJP World Language Trees of Lexical Similarity: Version 5 (October 2021). https://asjp.clld.org/download

Polyakov, Vladimir N., Valery D. Solovyev, Søren Wichmann, and Oleg Belyaev. 2009. Using WALS and Jazyki Mira. Linguistic Typology 13: 135-165.

Wichmann, Søren, Eric W. Holman, and Cecil H. Brown (eds.). 2022. The ASJP Database (version 20). http://asjp.clld.org/.

Wichmann, Søren, Eric W. Holman, Dik Bakker, and Cecil H. Brown. 2010. Evaluating linguistic distance measures. Physica A. 389: 3632-3639; doi:10.1016/j.physa.2010.05.011

Wichmann, Søren and Arpiar Saunders. 2007. How to use typological databases in historical linguistic research. Diachronica 24.2: 373-404.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This is a very interesting study, I would very much like to review it in due detail, but since the authors do not share all their code, I cannot do so. So please:

- provide all data that you compiled for the study already for the reviewers (use anonymous links in the Open Science Framework, for example, https://osf.io)

- provide with the data explicit code that shows how all plots were made and allow us to replicate them, along with the results, etc. and clear explanations on how to replicate (README)

- since you use an online tool to calculate your data distances, please provide data in the form in which they can be analyzed with the tool, along with instructions

You find more information with detailed tips in two blog posts:

- https://calc.hypotheses.org/2877

- https://calc.hypotheses.org/2782

If these requirements are met, I will gladly provide a detailed review of the paper and its findings. But if I cannot even see the data points, I cannot assess the quality of the study, which is otherwise quite interesting. Feel free to contact me via the editors, so I provide more information on how to share data with reviewers during review stage.

------update on 5 March-----
I have checked the code, found some problems in the data, and also provide additional code to run an analysis that I consider important to complement the paper. I think this should give the authors enough material to improve upon their study.

Comments for author File: Comments.zip

Author Response

provide all data that you compiled for the study already for the reviewers (use anonymous links in the Open Science Framework, for example, https://osf.io)

- provide with the data explicit code that shows how all plots were made and allow us to replicate them, along with the results, etc. and clear explanations on how to replicate (README)

The materials requested are now available through the following url:

https://osf.io/znhyd/?view_only=85abb174a8704b31a8079bd12b9eaaf1

We apologize for the three-months delay in making this information avalaible. We were not aware of the reviewer's request.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

I am okay with this version, the only aspect I am Missing snd ask the authors to add is an additional section that shares the data clearly, providing the link to osf. I also expect that the editors make sure this is appropriately referenced inside the final paper.

Author Response

On the request of Reviewer 2 (and after consulting with the guest editors), we have added a Data Availability section at the end of the paper. All highlights in the previous revision have been removed and replaced by yellow highlights for the new changes.

Author Response File: Author Response.pdf