Evolution and Trade-Off Dynamics of Functional Load

Functional load (FL) quantifies the contributions by phonological contrasts to distinctions made across the lexicon. Previous research has linked particularly low values of FL to sound change. Here, we broaden the scope of enquiry into FL to its evolution at higher values also. We apply phylogenetic methods to examine the diachronic evolution of FL across 90 languages of the Pama–Nyungan (PN) family of Australia. We find a high degree of phylogenetic signal in FL, indicating that FL values covary closely with genealogical structure across the family. Though phylogenetic signals have been reported for phonological structures, such as phonotactics, their detection in measures of phonological function is novel. We also find a significant, negative correlation between the FL of vowel length and of the following consonant—that is, a time-depth historical trade-off dynamic, which we relate to known allophony in modern PN languages and compensatory sound changes in their past. The findings reveal a historical dynamic, similar to transphonologization, which we characterize as a flow of contrastiveness between subsystems of the phonology. Recurring across a language family that spans a whole continent and many millennia of time depth, our findings provide one of the most compelling examples yet of Sapir’s ‘drift’ hypothesis of non-accidental parallel development in historically related languages.


Introduction
Functional load (FL) quantifies the contribution of specific phonological contrasts to distinctions made in the lexicon of a language [1][2][3]. In English for example, the phonemes /t/ and /d/ contrast, and thus there exist phonological strings in English, including consonant clusters, syllables, and whole words, which differ only by virtue of one containing /t/ in a position where the other contains /d/. Examples within word-length strings include time/dime, welter/welder, and hit/hid. At a conceptual level, the FL of the {/t/,/d/} contrast in English is the degree to which that contrast supports the distinctiveness of phonological strings in the English lexicon, or conversely, the degree to which those strings would be conflated if /t/ and /d/ were merged into a single category.
A classic operationalization of FL by Hockett [2] is in terms of entropy [4]. Hockett's definition makes reference to domains, D, which could be words, morphemes, syllables, or any kind of substring composed of phonemes. In a language L , the lexicon Λ will contain a set S D,Λ of unique phonological string types, s, which comprise a domain of type D. The entropy of domain D in lexicon Λ is: Pr(s) · log 2 Pr(s) (1) Following Hockett [2], the FL of a phonological contrast φ in lexicon Λ and domain D is the difference between two entropy measures: the entropy of domain D in lexicon Λ, and of domain D in an altered lexicon Λ φ , created by collapsing the contrast φ in Λ: A related metric is a normalized functional load, which states the FL of a contrast relative to the entropy of domain D in lexicon Λ [3]: A phonological contrast, φ, may refer to distinctions between the members of a single set of phonemes, such as between two phonemes {/t/, /d/} or between four phonemes {/t/, /d/, /s/, /z/}. Alternatively, it may refer to a collection of multiple, parallel distinctions, defined by a collection of sets and the distinctions within each of them. For instance, a contrast between voiced and voiceless stop consonants could refer to the distinctions within each of the three sets in the collection {{/p/, /b/}, {/t/, /d/}, {/k/, /g/}}, and a contrast between the places of articulation of stop consonants could refer to the distinctions within each of the two sets in the collection {{/p/, /t/, /k/}, {/b/, /d/, /g/}}. In all cases, the altered lexicon Λ φ is obtained by taking each individual set and replacing the phonemes within it with a single phonemic symbol that is distinct from all other phonemic symbols in L .
As FL is a frequency measure calculated from empirical datasets, the results that one obtains will vary according to the corpus used [3,5]. Accordingly, when investigating FL, it is important to bear in mind how the properties of a dataset might relate to the research question and to any assumptions relevant to the interpretation of results.

Functional Load and Explanation in Linguistics
Human linguistic communication is a highly complex cultural and cognitive phenomenon. One of its key traits is that every language contains an inventory of thousands of distinctive symbolic units termed morphemes, which in turn, comprise smaller, distinctive phonological elements [6]. In spoken languages, these include phonemes (found in all languages) and lexically distinctive prosodic elements, such as tones, accent and stress patterns (found in around half of all languages [7][8][9]). Of equal significance is that, although the existence of morphemes and phonemes is universal, the actual inventories are language-specific. In linguistics, this specificity of individual languages gives rise to fields of enquiry, such as linguistic typology, studying the range and distribution of variation across languages, and historical linguistics, studying the evolution of distributions over time. Both subfields contribute to the scientific understanding of human language in general by generalizing across the variation of individual language systems.
To better understand how linguistic systems provide and organize the distinctiveness that makes linguistic communication possible, investigations of FL play a valuable role in shedding light on how distinctions in phonological modules function to support distinctiveness in larger strings. Currently, in both typology and historical linguistics, the study of FL is in its infancy with much of the potential scope of FL research still to be explored. Most studies of FL, for instance, have focused on words as the strings of interest (see Section 1.2 below). However, words are only one kind of string known to be significant in the structure and use of linguistic systems. Systems of phonological structure exhibit well-known organizational principles at levels above the phoneme and below the word [10,11], while psycholinguistic research shows that listeners also use multi-phonemic strings smaller than the word or morpheme, to gain efficiencies in speech processing [12]. The study we present here illustrates some of the potential of studying strings other than words.
FL research to date has focused either on a small number of related languages (e.g., [5]) or on surveys across very distantly related languages (e.g., [3]). These are useful for pro-ducing highly detailed case-studies of well-understood systems or for evaluating the outer bounds of variation across languages in general. However, insights into historical dynamics at a statistically significant level are most readily obtained through phylogenetic analysis of large sets of related languages. Here, we fill a gap in the FL research by presenting the first phylogenetic study of FL and thereby produce the first results for historical dynamics that are supported at a statistical level across a large language family.
Research on FL has focused particularly on phonological contrasts whose FL is very low, for reasons related to a prominent conjecture in historical linguistics [1] (see Section 1.2). However, contrasts in human languages have a range of different functional loads, from very low to very high, and important questions include why this range exists and how FL values evolve over time. In terms of the evolutionary question, we can ask: (1) What kinds of changes to FL values appear to occur? (2) How do those changes pattern more broadly, for instance, does their distribution follow some well-described stochastic process? (3) What might the possible causes of change be that give rise to the distributions we see?
Answering these questions will ultimately require fitting together the pieces of a large puzzle. In this study, we contribute some central pieces. We produce some striking findings regarding the phylogenetic signal of FL. A phylogenetic signal is a measure of how close the match is-at the level of a whole language family-between (1) actual FL values and (2) what would be expected if FL was inherited from ancestor languages to their descendants while undergoing fluctuations according to a stochastic process termed Brownian motion (more on this in Section 2.2).
Naturally, we would not suppose that the real evolution of FL is so simplistic, rather the questions are: how close to such a scenario does the evolution of FL appear to come, and how does this compare to the evolution of other properties of language? Answering these questions provides insights into the kinds of explanations for FL evolution that could be consistent with the data. We also examine the correlation, or trade-off relationship, that can exist between the FL of two different contrasts in a linguistic system. Understanding how multiple, individual changes to FL are linked (or not) provides new insights into the historical dynamics of contrastiveness in linguistic systems.

Functional Load and Sound Change
Languages undergo mutational changes, known as sound changes, in which sounds may change from one phonemic category to another [13]. The term unconditioned merger refers to sound changes that cause two or more previously distinct phonemic categories to become conflated into one. Conditioned mergers are when conflation affects the phonemes only in certain contexts. FL has attracted attention as a possible explanatory factor in the incidence of mergers; it has long been conjectured [1] that contrasts with low FL are more prone to merge than contrasts with high FL, and recent results support that conjecture [5,[14][15][16]. Debate is ongoing over which operationalizations of FL provide the greatest predictive power [5] and what kinds of mergers FL predicts [17]. Whether it used entropy-based definitions of FL or not, most research to date has focused on domains D that are words.
When a contrast undergoes an unconditional merger, its FL falls to zero. The fact that contrasts with low FL values are more likely to fall to zero than contrasts with higher FL values, is an observation that can be described in several ways, and each way of describing it can have different implications for what we believe is in need of explanation. For instance, taken by itself, it is consistent with a description in which (1) only falls to zero FL show a tendency for small changes in FL to be more likely than larger changes; (2) all falls in FL, whether the fall is to zero or otherwise, show the tendency; or (3) all changes to FL, whether they are falls or rises, show the tendency.
Since prior research has focused its attention on falls to zero, it has not yet been established whether or not current observations will generalize to support strictly case (1), the slightly broader case (2), or the general case (3). The study we present here will consider evidence for case (3). If it turns out that case (3) is supported, then this may recast how we think about recent results in sound change, since the phenomenon requiring explanation will not be only mergers but all changes in FL.

Contrastiveness in Pama-Nyungan VC Strings
In this paper, we examine FL from a phylogenetic perspective, investigating how FL evolves over time. Our empirical focus is in the large, Pama-Nyungan (PN) language family of Australia. PN languages extend across 90% of the Australian mainland and the time depth of the family is estimated at around 5000-6000 years before present [18][19][20].
In many PN languages, an inverse correlation has been observed between phonemic vowel length and the phonetic duration of a following consonant [21][22][23][24][25]. This is particularly so for vowels in the first syllable of words, referred to as tonic vowels. Here, we focus on tonic vowels and single, intervocalic consonants that follow them. Post-tonic single consonants that follow phonemically short vowels and that have phonetically longer durations in some languages also exhibit additional phonetic properties associated with long duration, such as more complete closure and passive devoicing of stops as well as pre-stopping of nasals and laterals.
Conversely, post-tonic single consonants that follow phonemically long vowels and that have phonetically shorter durations may exhibit more voicing and lenition of stops, and the absence of pre-stopping in laterals and nasals [21,22,[26][27][28]. Examples of allophonic conditioning of this kind are cited in Table 1. Table 1. Examples of allophony in post-tonic consonants conditioned by the phonemic length of the tonic vowel. Key: V_ after a phonemically short vowel, VV_ after a phonemically long vowel. See references for additional details in the conditioning of allophony.

Language (Subgroup) Consonants V_ VV_
Djambarrpuyngu (Yolngu) [25] Consonants longer shorter Wik (Middle Paman) [29] Stops tenser laxer Kugu Nganhcara (Middle Paman) [30] Voiced stops stop fricative Nukunu (Thura-Yura) [31] Nasals, Laterals prestopped plain Yadhaykenu (Nothern Paman) [32] Laterals plain flapped A general fact about sound change is that when two phonemes merge, it is possible for phonetic correlates of the original contrast, which are manifested in other segments, to remain in place and become contrastive. This has occurred in multiple branches of PN, as phonemic vowel length is lost while its erstwhile phonetic correlates on the following consonant remain and become distinctive. Examples are cited in Table 2.
In the cases cited in Table 2, the complete merger of the length contrast in the tonic vowel is associated with an increase in contrastiveness in following consonants. In such cases, the FL of the length contrast in tonic vowels falls (to zero) while the FL of manner of articulation contrasts (including voicing and fortition) in the following consonant rises. Consequently, there is a trade-off relationship between FL V , the FL of tonic vowel length, and FL C , the FL of post-tonic consonantal manner of articulation.
This trade-off is of a very specific kind, in which the complete merger of all short/long vowel pairs reduces FL V to zero. We will refer to this as a trade-off with contrast collapse. Other trade-offs are possible, however. If a length contrast is lost only in certain vowels, and/or only in certain contexts, then FL V would fall (though not to zero), and in such cases, FL C could be expected to rise if the consonants become more contrastive when they follow the vowels that do merge. This scenario would give rise to a second kind of trade-off. We will refer to this second kind as a trade-off with contrast maintenance.
The two studies that we present below examine the evolution of the FL of tonic vowel length contrasts and of contrasts in the following consonant in PN.
Study 1 establishes that these FL variables contain significant phylogenetic signals. One implication of this is that our data are consistent with an evolutionary process in which smaller changes in FL-both falls and rises-have been more common than larger changes. More detail is given in Section 2.2 below. Study 2 examines FL trade-offs with contrast maintenance. We test the hypothesis that FL V and FL C are negatively correlated in PN. FL C is defined in terms of the manner of articulation of post-tonic consonants, and we expect FL C to correlate negatively with FL V for the phonetic and historical reasons introduced above. We also examine FL P , the FL of place of articulation of post-tonic consonants. As research on PN languages has identified no particular association between vowel length and consonant place, our hypothesis is that FL P will show no significant correlation with FL V . Table 2. Examples of post-tonic consonant contrasts created upon the merger of length distinctions in tonic vowels of Pama-Nyungan languages. Key: T short stop, TT long stop, D voiced stop, Z spirant, N nasal, NN long nasal, DN prestopped nasal, ND nasal+stop, L lateral, DL prestopped lateral, V_ after erstwhile short vowel, and VV_ after erstwhile long vowel. See the references for additional details and conditioning of the tabulated sound changes.

Functional Load Data in Pama-Nyungan
We estimated the FL for vowel length (FL V ), consonant manner (FL C ), and consonant place (FL P ) in domains comprised of a tonic vowel followed by a single, intervocalic consonant in a set of 90 Pama-Nyungan languages listed in Appendix A. To calculate the FL, we used both the unnormalized formula in (2) and the normalized formula in (3). In cases where we intended to be specific, we refer to the unnormalized values as FL u V , FL u C , FL u P and the normalized values as FL n V , FL n C , FL n P . In most cases though, when the point under discussion applies equally to both, we write FL V , FL C , FL P .
At the current stage of documentation of global linguistic diversity, every major language family contains many low-resource languages for which data are scarce [41]. As a consequence, studies of FL, such as ours, which aim for a coverage that spans whole families, will face limits on the data available. This is true in the case of Pama-Nyungan. For most languages in our dataset, the available corpora are lexical lists. As we will see in Section 3, this does not prevent clear results from emerging, and we return to discuss the reasons why in Section 4.1.
Correspondingly, our FL estimates were based on lexical datasets from which we extracted instances of the domain of interest. The lexicons contained between 208 and 3215 domain instances (mean 774 and median 605). The 90 languages studied were selected by taking the 112 PN lexicons studied in [42] and keeping only those that (1) have some degree of tonic vowel length contrast (since we want to study changes in FL that do not involve complete mergers) and (2) that have greater than 200 domain instances. A representative tree of these 90 languages is shown in Figure 1.  Figure 1. Pama-Nyungan tree containing the 90 languages used in this study, inferred from lexical cognacy judgements. Displayed here is a single, maximum clade credibility tree, i.e., the one tree within the 1000-tree sample that most adequately represents the most frequently recurring subgroups in all of the trees of the sample.
When classifying vowels as short or long, we regarded sequences of two adjacent short vowels as one long vowel, and sequences of /uwu/ and /iji/ as long vowels, since a tradition followed in some Australianist analysis is to represent long high vowels [u:] and [i:] as phonemic vowel-glide-vowel sequences (e.g., ( [43], p. 24), ( [44], p. 91)).
Phonemically long or geminate consonants, and phonemically pre-stopped sonorants, have been analysed as both mono-and bi-segmental units in the Australianist literature [45]. Here, we were guided by the kinds of historical developments that we wish to study, and we classed them as single segments.
The FL data obtained for the 90 PN languages are reported in Appendix A.

Phylogenetic Analysis
As languages are related to one another, it is not statistically valid to treat crosslinguistic observations as independent [46]. Quantitative phylogenetic methods [47] take genealogical relatedness into account in a principled and statistically sound manner. Our two studies use phylogenetic techniques in order to make valid inferences from the crosslinguistic FL data in PN.
Study 1 assesses the degree of phylogenetic signal in FL V , FL C , and FL P . The phylogenetic signal is a measure that compares the variation in an observed variable against its expected variation if it evolved along a phylogenetic tree, t, according to a Brownian motion process. In Brownian motion, the value of a variable is in constant flux. Positive and negative changes are equally likely at all times, and small changes are more likely than large ones. A phylogenetic signal will be stronger when the variable in question actually did evolve along the tree, and less strong if it was influenced by lateral transfer as in borrowing, especially borrowing among languages that are only distantly genealogically related.
The phylogenetic signal will be stronger when the variable evolved along the specific tree t, to which the data are being compared and not some other tree, t . Furthermore, it will be stronger when evolution was similar to Brownian motion, so that the value of the variable had equal probabilities of shifting up or down at any point, as opposed to (for example) being constrained within some tight range, so that at extreme outer values, there was a greater chance for the variable to evolve back towards the central value than further outwards. For a more technical description of phylogenetic signals written for a linguistic readership, see [42,46].
Here, we measure the phylogenetic signal, using the picante package in R [48], according to a standard two-step procedure described in Blomberg et al. [49]. First, the variation in the data is compared to a randomized baseline in which the shape of a previously determined tree t plays no role in structuring the data. The null hypothesis is that, relative to the structure implied by the tree, the data are simply random; the alternative hypothesis suggests patterns like the tree.
For example, a phylogenetic signal is considered to be evident at a p = 0.05 level if the variation in the real data matches the tree better than 95% of the randomized datasets. Next, if the data has been confirmed as significantly differing from randomness, then we use the statistic, Blomberg's K to measure the strength of phylogenetic signal. (For mathematical details of the calculation of K, see [42,49].) Blomberg's K takes a value of 1 if the variation in the data accords perfectly with the tree t, and a minimum value of 0 if the data are perfectly randomly distributed relative to t. Values in excess of 1 are possible if FL data values are highly clumped within subgroups of the family.
When calculating a phylogenetic signal, reference must be made to a tree, t. In our case, the aim is to compare FL data to the PN family tree. However in linguistics, there is uncertainty regarding the details of this tree. Uncertainty about the details of trees is common in phylogenetic research and is termed phylogenetic uncertainty. Here, we employ a standard approach to account for phylogenetic uncertainty, by measuring the phylogenetic signal with respect to not one tree t but a sample of 1000 highly-likely family trees t 1 , t 2 , . . . , t 1000 . This generates 1000 tests against randomness, followed by 1000 estimates of K, which provide a distribution describing its likely value. Our tree sample t 1 , t 2 , . . . , t 1000 comprises 1000 dated phylogenetic trees from the posterior distribution inferred by Bowern [50] and described further in Macklin-Cordes et al. [42]. The trees were inferred from cognate data from which known borrowings were excluded [19,20].
Study 2 examined the phylogenetic Pearson's correlation [51] between FL V and FL C and between FL V and FL P . This test is conceptually parallel to a regular Pearson's correlation; however, it also takes into account the specific kinds of non-independence caused by genealogical relationships between languages. As a consequence, the results reflect correlations not only between the values of traits in individual modern languages but also between values characteristic of subgroups at all levels in the tree. As such, the results for a pair of variables can inform us about the strength and direction of linked relationships that characterize the language family as a whole, through its history. When we calculate the statistics, as with our estimate of the phylogenetic signal, we take phylogenetic uncertainty into account by performing the correlation test in reference to the sample of 1000 highly-likely trees.
Most statistical tests require assumptions to be made about the data. The test we use in study 2 assumes that FL V , FL C , and FL p evolve along a phylogeny following Brownian motion. Given the results that we obtained in Study 1 for Blomberg's K (see Section 3), the assumption is well motivated. We used the phytools R package [52] to estimate the covariance matrix of the Brownian motion on each tree in the sample, giving a sample from the posterior distribution of Pearson's r correlation. The p-values reported were computed using the posterior mean estimates and correspond to testing the null hypothesis that the correlation is zero against the alternate hypothesis that the correlation is non-zero.

Study 1
The statistical significance of the presence of phylogenetic signal was measured to three digits of accuracy, and for all FL variables and all trees, the highest p-value was p = 0.001, indicating that a phylogenetic signal was significantly present. The strength of the phylogenetic signal as measured by Blomberg's K was very close to 1 for FL V , FL C , and FL P , as shown in Table 3, both for the normalized and unnormalized versions of the FL measure. To place these K values in context, Macklin-Cordes et al. [42] examined the lexical Markov chain transition probabilities of biphones (two-segment sequences) in PN and found mean K values of 0.54 or mean K of 0.59 when segments were binned into groups by place or manner of articulation. Macklin-Cordes and Round [46] examined the relative frequencies of dental versus palatal consonants in word initial and intervocalic positions in PN and found mean K values from 0.78 to 1.32 word-initially and from 0.34 to 0.70 intervocalically.
Dockum [53] examined phoneme frequencies and biphone Markov chain transition probabilities in languages of the Tai family and found mean K values of 0.71 and 0.68, respectively. Further afield, Blomberg et al. [49] examined 121 biological traits of a wide variety of plant and animal organisms, finding mean K values of 0.35 for behavioural traits, 0.54 for physiology, and 0.83 for traits related to body size. Taken in this context, our results suggest that the evolution of FL is very well described by Brownian motion process along the PN tree. Study 2 One consequence of the high levels of phylogenetic signal found in FL, is that statistical analysis, such as the measurement of correlations should be carried out using phylogenetic comparative methods [46]. Phylogenetic Pearson's correlation (Table 4) was significant and negative between FL V and FL C but did not reach significance between FL V and FL P , in accordance with our hypotheses. This was true for both for the normalized and unnormalized versions of FL.

Discussion
The studies in this paper have examined language diachrony at a statistical level. In doing so, we contribute to a more precise, quantitative characterization of diachronic typology. Specifically, we studied the historical dynamics of FL, which is a quantitative characterization of the contribution of specific contrasts to distinctiveness in the lexicon.
We established that the FL of certain variables evolves according to non-independent stochastic processes: they were found to change in a linked, statistically correlated fashion across almost a hundred languages and thousands of years of history. Moreover, we demonstrated that FL exhibited interesting historical dynamics that are deserving of further investigation not only at values close to zero, which have been the focus of prior work but at higher values as well.
Recent research has confirmed a long-standing conjecture that contrasts with low FL are particularly prone to merger [5,[14][15][16], which is to say that FL is more likely to fall to zero from a lower value than from a higher one. Efforts at explaining this phenomenon have focused on homophony avoidance, an account that is specific to FL falling to zero and doing so in the domain of whole words [5,17]. However, our findings suggest that there may be nothing special about FL falls as opposed to rises, and changes to zero as opposed to other values. In addition, the word is not the only domain in which causally interesting effects of FL may be active.
Consequently, although it was not our primary focus here, our findings suggest that recent efforts may be focusing on too narrow a research question and consequently entertaining a set of explanatory accounts that will generalize only poorly to other, related phenomena. Future research will benefit from broadening its scope beyond the recent, more narrow focus on FL, which falls to zero in whole words.
We now take up three additional topics for expansion and emphasis.

High Degree of Phylogenetic Signal in FL
Phylogenetic signals have recently been shown to be present in phonotactic biphone frequencies, phoneme frequencies, and contextual ratios of places of articulation [42,46,[53][54][55]. These studies reveal that the frequencies of phonological structures pattern with genealogy. Here, we find a high level of phylogenetic signals also in FL-that is, in the contrastive function that phonological structures serve. Interestingly, we find that the phylogenetic signal in the FL measures examined here was very close to 1 and closer than the values found in studies of phonological structures. Two questions can be posed in response: why did we find a strong phylogenetic signal in FL, and why is it even stronger than in structural traits? Any answer at this stage of research is necessarily speculative; however, the observations we offer here may point to useful lines of future inquiry.
Why did we find a strong phylogenetic signal in FL in the data that we used, bearing in mind that our data (1) are sourced from lexical lists of word types, not tokens; (2) are sourced from lists that are short, mostly numbering in the hundreds of items, not thousands or tens of thousands; and (3) examine FL in domains comprised of a tonic vowel and following consonant, not whole words. One prior expectation might have been that since the datasets are so small and since they do not examine the full words that are the focus of much recent research, they would be awash in statistical noise and exhibit little patterning of significance. Evidently, this is not the case, however, and we believe there may be reasons why not.
Much research on FL has focused on the hypothesis that FL exerts an influence on sound change through a mechanism of homophony avoidance [1,5,15,17]. Since homophony is a relationship that holds between words, the effect of such a mechanism may be to promote FL within the domain of whole words. Furthermore, as a consequence of that mechanism, sound changes would be less likely to occur, the more they altered the FL within words. However, whether this hypothesis is correct or not, other mechanisms should not be ruled out.
For instance, during speech processing, words become activated cognitively well before the listener hears the entire word [56]. Consequently, any sound change that caused a loss of a contrastiveness early in a word could potentially impair the ease of processing, even if it did not result in homophony. Accordingly, if we are prepared to entertain the existence of homophony avoidance as a causal factor in sound change, it is not unreasonable to entertain the existence of an avoidance of loss of contrastiveness early in the word as an additional factor in sound change (for supporting evidence, see [57,58]).
By this line of reasoning, since our data focuses on the first vowel and following consonant of PN words, i.e., contrastiveness early in the word, it is not altogether surprising that our results were significant despite the fact that we did not examine whole words. (PN roots are typically disyllabic and affixation is suffixal [59]; the initial consonant position, before the tonic vowel, permits only a subset of the contrastive consonants found elsewhere [45,60], which potentially increases the importance placed on maintaining the subsequent VC contrasts.) Our data come from short word lists, which might be expected to supply FL values that are, at best, a noisy approximation of the more precise FL values obtainable from larger lists or from token-based corpus data [61,62]. However, the words that appear in short wordlists are heavily skewed towards the most frequent words of a language, and these are the words that listeners would process most often and would have learned the earliest during acquisition. If we grant that words of higher frequency and earlier acquisition are likely to play an especially significant role in the mechanisms behind sound change, then it follows that even short wordlists will plausibly contain rich evidence of the contrastiveness that matters most.
Our results also align with the findings of [61] in that, above a minimum threshold for wordlist length, even lists of only a few hundred words contained phonemic distributions that conformed closely to the full lexicon, when randomly sampled from a larger lexicon containing thousands of items. Thus, it is not as surprising as it might first seem that we obtained clear results and a strong phylogenetic signal from the limited data we had available.
Our second question was, why does the phylogenetic signal appear higher in FL than in structural traits of phonology, such as phonotactics? To answer this, we return to the two causes of stronger/weaker phylogenetic signals described in Section 2.2.
First, a phylogenetic signal is stronger when computed relative to the truest tree and lower otherwise. However, the trees we used here for PN while studying FL are the same as those used by Macklin-Cordes et al. [42] for phonotactics, and thus a difference in the trees used is unlikely to be the cause of the differences in phylogenetic signal.
Second, a phylogenetic signal is stronger if the change process has the properties of Brownian motion. In Brownian motion, small changes are more likely than large changes, and positive and negative changes are equally likely. We take these aspects in turn.
Both FL and structural traits-as with phonotactic frequencies-change as the lexicon changes. Any lexicon is constantly affected in small ways by neologisms and the obsolescence of words. Additionally, they may be affected by borrowing, which can occur at various rates, and by sound changes, which can occur in highly specific contexts or more sweeping ones. This mixture of factors supports an expectation that small changes will be frequent and larger changes less so, and it is not obvious that there would be significant differences in this regard between FL or traits, such as phonotactic frequencies.
It now remains to consider whether positive and negative changes in values are equally likely. For FL, positive/negative changes in values entail that a contrast becomes more/less central in supporting the distinctiveness of strings in the lexicon. There is a lower bound at zero; however, in our study, we did not include that lower bound. Aside from that lower bound, we are not aware of constraints that would make the likelihood of positive or negative changes uneven at any point, and consequently the stochastic process that describes changes in FL could genuinely be quite close to Brownian motion. For structural aspects of phonology, however, the situation is different.
Structural features are subject to constraints: there are less likely and more likely structures, both in universal and in lineage-specific terms [45,63]. Consequently, for instance, the frequency of a highly marked structure should be more likely to decrease than to increase. This kind of inequality in the likelihood of positive and negative changes to values-irrespective of the actual sources, such as production, perception, and cognitionentails a departure from Brownian motion, which ought to weaken the phylogenetic signal. This, we suggest, may be why structural traits appear to have a lower phylogenetic signal compared with FL. If this line of reasoning is correct, we would expect similar results to emerge from studies of other language families beyond PN.

Transphonologization and the Flow of Contrastiveness
Transphonologization [64] (cf. rephonologization [65] and cheshirization [66]) is a term given to sound changes in which a contrastive function is preserved; however, the locus of the contrast-the segments or features that instantiate it-changes. Here, we studied a closely related phenomenon in which contrasts do not necessarily disappear or emerge in their entirety, but the relative contrastive workload of them (their FL) does shift from one to another. One way to view this phenomenon is in terms of a diachronic flow of FL from one contrast to another (cf. [67]). The fact that we are able to quantitatively detect the presence of this flow of contrastiveness through a language family as large and old as PN suggests the potential of new avenues for investigating the dynamic flow of contrastiveness through phonological systems as they evolve over time.
One question that arises is whether our findings in PN might reflect some strong preference in language for the conservation of contrastiveness in which case, the flow of FL from one contrast to another might be regarded as an automatic consequence of one contrast undergoing a significant decrease in FL. Although our results alone cannot answer this question, we doubt that such a principle exists in any strong form. Certainly, in many mergers, the overall contrastive capacity of a language is simply reduced, as the FL of one contrast falls but no other FL rises to balance it.
In the case of PN tonic vowels and post-tonic consonants, we suggest that the cause of recurrent historical flow of FL lies in particular phonetic factors that are common across PN languages: a synchronic correlation between phonetic tonic vowel duration and phonetic post-tonic consonant manner, even in systems in which only the vowel-durational aspect is tied to a synchronic phonemic contrast; when the phonetic vowel-durational differences are neutralized diachronically, causing the phonemic vowel length contrasts to collapse, the phonetic manner differences-which still correlate with the same lexical distinctions that vowel length had signalled-become phonemic. On this view, it is the phonetics of the vowel-consonant strings that furnish the conditions for a natural flow of FL from vowel to consonant.

On Sapir's 'Drift': The Non-Accidental, Parallel Evolution of Related Languages
It is a century now since the appearance in print of Edward Sapir's hypothesis that languages undergo parallel grammatical evolution for several centuries after they split [68]. Providing anything more than anecdotal evidence in support of Sapir's hypothesis has long been difficult [69][70][71], and some apparent cases may be due to language contact [72,73]. Dunn et al. [74] used phylogenetic methods to examine patterns of word order evolution in different language families; however, the study did not produce an identifiable cause for those patterns.
Ideally, evidence in support of Sapir's drift should not be anecdotal but rather be statistically significant across a language family; it should not be reducible to the effects of language contact, and it should be relatable to an identifiable cause. The current study meets these three criteria. It detects parallel changes in FL that are instantiated statistically across 90 languages within the PN family whose time depth is estimated at around 5000-6000 years [18][19][20]; thus, the evidence is not anecdotal.
The data pattern fits tightly with phylogeny, and thus is unlikely to be due to contact (cf. our note in the next paragraph). Furthermore, we identified a causal basis for this, in the common phonetics of PN tonic vowel-consonant sequences. Thus, we believe our results to be one of the fullest confirmations yet that Sapir's conjecture was essentially correct: that under the right circumstances, linguistic systems can undergo parallel evolution after they split, not merely for centuries but for millennia.
A reviewer asks about the situation in which language contact closely mimics the pattern of phylogeny. If contact did pattern perfectly with phylogeny (such that languages only borrowed from their very closest relatives), then its effects would be indistinguishable. However, languages also borrow from geographic neighbours that are less closely related. It is hard to conceive of borrowing of lexical items whose impact on contrastiveness has the phylogenetic signal we find here in the absence of vertical inheritance. At the very least, the burden of proof is on the advocate of a language-contact account, given that the data pattern is in very close accord with expectations from vertical inheritance, and there is an accompanying explanation in terms of phonetics and sound change for why this should be so.
It has been suggested by Joseph [75] that drift in phonology may be due to a narrowing of the range of variation inherited from a proto-language. In the PN changes described here, however, the flow of FL from vowel length to consonant manner is not due to any narrowing of variation in FL in proto-PN (indeed it is not entirely clear what it should mean for FL to have a range of variation). Nor, when the PN developments are viewed in terms of phonological substance are they a matter merely of narrowing variation. Although contrasts in vowel length are lost, new variation is introduced in the inventory of contrastive consonant manners and into the set of relationships that can hold between the length of a tonic vowel and the manner of post-tonic consonants.
In reality, Joseph's proposal would appear to reduce to a fact, well-recognised in evolutionary biology, that incomplete lineage sorting (i.e., the inheritance of variation from a proto-taxon into its descendant) can result in the appearance of convergent evolution [76]. However, this does not entail that all convergent evolution is due to incomplete lineage sorting (see also [77]).
Another important source can be the existence of dependencies within a system that are inherited along with its substance [78], which will favour certain outcomes over others in descendent systems as has been observed in protein evolution; for example, ref. [79]. In PN, certain phonetic dependencies between tonic vowel length and post-tonic consonant manners were inherited alongside the phonological substance itself. In the descendent systems, in the event that vowel length was lost, the inherited dependencies favoured the rise of new, contrastive consonant manners.

Conclusions
This paper joins a growing body of work regarding the application of computational phylogenetic methods to phonological data. It also represents the first phylogenetic study of FL.
We showed that there was a significant phylogenetic signal in FL, which has implications for a better understanding of the dynamics of sound change. Further, we showed that the FL of tonic vowels and post-tonic consonants were negatively correlated in PN and that this maps closely to a sample of highly probable PN family trees. We also introduce the idea of the flow of contrastiveness between subsystems of the phonology in different languages, which is connected to the concept of transphonologization, and we claim that this represents a concrete example of Sapir's drift.
Setting our gaze beyond PN, while not all sound changes are associated with the phonetic conditions that promote the flow of FL, there are many that appear to be, and it will be valuable to apply the methods we introduced here to study them. In time, this may lead to a more general understanding of how FL can flow within phonological systems over long time horizons. Promising future applications of our approach include the investigation of other suspected diachronic trade-offs, such as the rise of phonemic tone and register (such as contrastive phonation) in Southeast Asia tied to losses of consonantal laryngeal distinctions [3,[80][81][82].    [42] and are available at https://zenodo.org/record/3988775 (accessed on 15 December 2021).

Acknowledgments:
We are grateful to the editors and three anonymous reviewers for their insightful and helpful comments. Versions of this paper were presented at the Edinburgh Symposium on Historical Phonology 2021 and the Annual Meeting of the Australian Linguistics Society 2021, and we are grateful to the audiences at these meetings for their valuable comments. This research was initiated at the workshop, An evolutionary science of word and sound systems, hosted by the Max Planck Institute for the Science of Human History, Jena, in November 2019.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

FL
Functional load PN Pama-Nyungan