Universal Features in Phonological Neighbor Networks

Human speech perception involves transforming a countinuous acoustic signal into discrete linguistically meaningful units (phonemes) while simultaneously causing a listener to activate words that are similar to the spoken utterance and to each other. The Neighborhood Activation Model posits that phonological neighbors (two forms [words] that differ by one phoneme) compete significantly for recognition as a spoken word is heard. This definition of phonological similarity can be extended to an entire corpus of forms to produce a phonological neighbor network (PNN). We study PNNs for five languages: English, Spanish, French, Dutch, and German. Consistent with previous work, we find that the PNNs share a consistent set of topological features. Using an approach that generates random lexicons with increasing levels of phonological realism, we show that even random forms with minimal relationship to any real language, combined with only the empirical distribution of language-specific phonological form lengths, are sufficient to produce the topological properties observed in the real language PNNs. The resulting pseudo-PNNs are insensitive to the level of lingustic realism in the random lexicons but quite sensitive to the shape of the form length distribution. We therefore conclude that “universal” features seen across multiple languages are really string universals, not language universals, and arise primarily due to limitations in the kinds of networks generated by the one-step neighbor definition. Taken together, our results indicate that caution is warranted when linking the dynamics of human spoken word recognition to the topological properties of PNNs, and that the investigation of alternative similarity metrics for phonological forms should be a priority.


I. INTRODUCTION
The preception and recognition of acoustic speech, known in psycholinguistics as spoken word recognition (SWR), requires that human listeners rapidly map highly variable acoustic signals onto stable linguistically relevant categories (in this case, phonemes, i.e. the consonants and vowels that comprise a language's basic sound inventory) and then piece together sequences of phonemes into words, all without robust cues to either phoneme or word boundaries (see here [4,5] for reviews).Decades of research on human spoken word recognition has led to a consensus on three broad principles: (1) SWR occurs in a continuous and incremental fashion as a spoken target word unfolds over time, (2) words in memory are activated proportionally to their similarity with the acoustic signal as well their prior probability (computed as a function of their frequency of occurrence) in the language, and (3) activated words compete for recognition.A key difference between theories is how to characterize signal-to-word and word-to-word similarity.Most theories incorporate some set some sort of similarity threshold, and pairs of words meeting that threshold are predicted to strongly activate each other and compete.Perhaps the most influential definition for the phonological similarity of spoken words is the concept of phonological neighbors posited under the Neighborhood Activation Model (NAM) by Luce and colleagues [1,2].NAM includes a gradient similarity metric and a threshold metric, although only the latter is widely used (and we focus on it here).The threshold metric defines neighbors based on the Deletion-Addition-Substitution (DAS) string metric, which states that two words are neighbors (i.e., they are sufficiently similar to strongly activate one another and compete) if they differ by no more than the deletion, addition, or substitution of a single phoneme.Thus, cat has the deletion neighbor at, addition neighbors scat and cast, and many substitution neighbors, such as bad, cot, and can.NAM predicts that a target word's recognizability is determined according to a simple frequency-weighted neighborhood probability rule which is defined by the ratio of the target word's prior probability to the summed prior probability of all its DAS-linked neighbors.The NAM rule predicts a greater proportion of the variance in spoken word recognition latencies (10-27%, depending on task [lexical decision, naming, or identification in noise] and conditions [signal-to-noise ratio] [1]) than any other measure that has been tested (e.g., log word frequency alone accounted for 5-10% of variance in Luce's studies).
The focus of the NAM approach has typically been used to characterize the recognizability of single words according to the sizes (densities) of their locally defined neighborhoods.More recently, it has been realized that viewing the structure of the phonological lexicaon globally as a complex network enables the probing of connections between both large and small scale network topology and human spoken word recognition.Thus, rather than considering a word and its neighbors in isolation, the set of neighbor relationships for an entire lexicon can be represented as an unweighted, undirected graph [3] in which words (phonological forms) are represented by nodes and two words are joined by an edge if they meet the standard NAM DAS threshold.The NAM approach can be translated to the network context to mean that (frequency-weighted) node degree is important for predicting latencies in spoken word recognition.There are also prior indications that other topological properties (e.g.node clustering coefficient [6,7], closeness centrality [8], and second neighbor density [9]) may also explain some aspects of SWR that the frequency-weighted neighborhood probability it is based upon does not.
Previous studies have shown that what we will call the phonological neighbor network, or PNN, for English has some features of both Watts-Strogatz [10] and Barabasi-Albert [11] graphs.It has a relatively short mean geodesic path length and high clustering coefficient, but also has a degree distribution that is at least partially power law [3].Subsequent analyses of additional languages (English, Spanish, Hawaiian, Basque, and Mandarin) have shown these characteristics to be broadly shared across languages when PNN graphs are constructed using NAM's DAS rule [12].On the basis of these results, Vitevich and colleagues have assigned importance to these language "universals" and argued that many of these properties are sensible if not essential (e.g. , high degree assortativity, which measures the tendency of nodes to be connected to other nodes of similar degree, can buffer against network damage) [12].However, making claims about SWR on the basis of the properties of PNNs alone is potentially fraught for at least two reasons.First, PNNs are static representations of lexical structure, whereas spoken words are processed incrementally over time.Second, different measures of word similarity will result in radically different PNNs.NAM's DAS rule is based on a relatively simple string distance metric that provides a local measure of interword similarity that is insensitive to the sequence of phonemes in a word.Thus, while NAM's DAS metric accounts for substantial variance using a regression-based approach (predicting response latencies for many words), there is substantial evidence from studies examining competition between specific pairs of words with different patterns of positiondependent phonological overlap that words whose onsets overlap compete more strongly than words that are matched in DAS similarity but whose onsets are mismatched (e.g., battle would compete more strongly with batter than with cattle [13]).Marslen-Wilson and colleagues [14,15] proposed a threshold metric that gives primacy to onset similarity.
They focused on the notion (consistent with many priming and gating studies [15]) that the "cohort" of words activated by a spoken word is restricted to words overlapping in their first two phonemes.Thus, the cohort competitors of cat include not just DAS neighbors overlapping at onset (can, cab, cast) but also longer words that would not be DAS neighbors (cattle, castle, cabinet).In addition, the cohort metric predicts that rhyme (i.e. a word's vowel and following consonants) neighbors (cat-bat, cattle-battle) do not compete because they mismatch at onset, despite high DAS similarity.A PNN based on a simple onset cohort rule (connect words that overlap in the first two phonemes) would obviously have very different structure than a DAS-based PNN.When using PNNs to compare lexical structure between languages, we must consider the potential role of the similarity metric itself in determining the network's structure and topology.This possibility calls into question any universal (language-independent) claims about SWR based on DAS networks.Prior work has demonstrated that this is likely true at least in English, as PNNs constructed from a random lexicon with the same phonological constraints as English are basically indistinguishable from the real language network [16,17].
Here, we explore this possibility further by extending DAS-based PNNs to four languages in addition to English: Spanish, French, German, and Dutch.We show that PNNs for these languages have degree distributions and topological properties similar to PNNs previously constructed for English, Spanish, Hawaiian, Mandarin, and Basque [12].We then show, by separating words by number of syllables, that all five language networks consist of aggregations of at least two very different networks, as has been previously suggested for English [16].
We also note for the first time the effects of homophones such as bare and bear on PNN structure.Finally, using a set of models that generate random lexicons with varying levels of phonological realism, we show that even extremely simple random lexicons, along with language-specific phoneme inventories and distributions of phonological form length (that is, the frequency distributions of words of different lengths, ignoring all other linguistic details), can create pseudo-PNNs that share all the properties of real PNNs.In fact, adding phonological constraints (e.g., phonotactic constraints on phoneme sequences) does very little to improve pseudo-PNN match to language-based PNNs.While these pseudo-PNNs are quite insensitive to the level of realism in the lexicon, they are extremely sensitive to the empirical form length distribution, which we show drives all of the observed differences among English, French, German, Dutch, and Spanish.Our results suggest that the primary determinant of the observed topology of PNNs is the neighbor definition itself, which dramatically limits the network structures possible in PNNs.In addition, our work strongly motivates the consideration of alternate phonological similarity metrics and suggests that it is important to try to understand the formative dynamics underlying the observed phonological form length distributions.

II. DATA
We used the freely available online CLEARPOND [18] database to construct DAS-based PNNs for five languages: English, Dutch, German, French, and Spanish.CLEARPOND is described in detail elsewhere [18], but in brief, it includes phonological transcriptions of orthographic forms and frequency information for over 27, 000 words from each language.
Frequency information for English [19], Dutch [20], German [21], and Spanish [22] is derived from the SUBTLEX database which counts word occurrences in television and movie subtitles.French frequency information is derived from Lexique [23], a fusion of an older French language database (Frantext) with word occurrence information derived from webpages.For all five languages we constructed PNNs based on the DAS rule described above: two words were neighbors and therefore linked with a bidirectional, unweighted edge, if they differed by no more than a single phoneme deletion, addition, or substitution.After PNN construction, we found that, in each language, a significant percentage of the words had no phonological neighbors, ranging from 24% (French) to 45% (Dutch).All singleton words were excluded from any further analysis, since their topological properties are either trivial (e.g. they are all degree zero) or undefined (e.g.clustering coefficient).In all five languages, the mean length of the neighborless words is larger than that of the words with neighbors, but this difference is not statistically significant (permutation test).

III. EMPIRICAL ANALYSIS OF PHONOLOGICAL NEIGHBOR NETWORKS A. Degree Distributions and Topology
Figure 1 shows the degree distributions for the five PNNs constructed from the CLEAR-POND data (compare also to Figure 10 in the original CLEARPOND paper [18]), and Table I gives a summary of some of the common topological measures employed in the empirical analysis of networks, all of which have been specifically highlighted in prior PNN research.
All five language degree distributions are best fit (via maximum likelihood) by a truncated power law, as tested via likelihood ratio [24].In addition, we observe that all PNNs have: (i ) relatively high clustering, (ii ) short mean geodesic paths, (iii ) extraordinarily high values of degree assortativity, and (iv ) relatively small giant connected components (the largest connected subgraph in the network).Thus, all five PNNs have similar degree distributions and topological characteristics, and they combine some features of Watts-Strogatz [10] graphs (high clustering) with Barabasi-Albert graphs [11] (power law degree distribution).High degree assortativity and small giant component sizes are features of the PNNs that are not displayed by either WS or BA graphs.These features are all consistent with previous studies on English alone [3,16] and other languages not studied here [12].
The grouping of languages in Figure 1 is rather surprising.Essentially, Spanish is by itself, Dutch and German have quite similar degree distributions, and English and French are grouped together.One might expect different clustering based on language typology; for example, with the two Romance languages (French and Spanish) grouped together.We will show that the observed clustering can be explained without any reference to the specific history of words.Instead, the structure of the phonological form length distribution, along with target language phoneme frequencies, are all that is required.

B. Islands and Frequency Assortativity
Given the relatively modest size of the giant connected component in all five languages (see Table I), it is worth examining the connected component size ("island size") distribution P c for each of the five PNNs.Power law distributions for the sizes of the connected components P c have been previously observed in PNNs for both English and Spanish [25].
Figure 2 shows that this power law distribution of component sizes is broadly shared over is mean geodesic path length, α is the power law exponent of the degree distribution, and r is the degree assortativity coefficient.Two values occurring in the table with a forward slash denote that quantity computed for the entire graph and only the giant component.Fits to degree distributions were performed via maximum likelihood [24] starting at k = 2, except for French which began at k = 10.Asterisks denote that the best fitting distribution is not strictly power law but rather truncated power law, as determined via a likelihood ratio test [24].
all five languages.In fact, the island size distribution is more robustly power law than the PNN degree distribution itself, albeit over a relatively modest range (less than a factor of 100).
We now remark on a previously unobserved feature of PNNs, again present in all five languages.All five languages show a weak but statistically significant degree of word-frequency based assortativity.Simply, words of similar usage frequency tend to be connected to each other in the PNN.We computed frequency assortativity by dividing the continuous word frequency data into ten equal-mass bins and then computing an assortativity coefficient and jackknife standard deviation using the definitions in Newman [26].The values ranged from 0.1 in English to 0.24 in Spanish, which correspond to between 26 and 47 Jackknife standard deviations.This is weak relative to degree assortativity in these networks (see Table I), but not insignificant on the scale of assortativity coefficients found in other social, biological, and technological networks [26].

C. DAS Graphs as Mixtures
There is a deep physical basis for observing power laws in thermodynamics.Diverging length scales at critical points mean that there are correlations at all scales in the system.
Critical point behavior cannot depend on any quantity (like a force) with an associated length scale, but rather only on scale-free quantities like symmetries and conservation laws.
Critical point phenomena then become universal, in the sense that the same behavior (critical exponents) is observed in systems that may have radically different forces but the same set of symmetries.
The converse is not true.Observation of power laws does not necessarily indicate any deep phenomena at work.Power laws in empirical data can arise from a wide variety of reasons, many of them mundane.One of the simplest is Simon's famous demonstration [27] that multiplicative (rather than additive) random noise can yield heavy-tailed distributions.
Another way to obtain power laws is via mixture distributions; in this case apparent scale-free behavior arises by simply mixing several distributions, each with well-defined but different scales.
Indications that the degree distribution of the PNN for English results from a mixture of distributions of different scales have been advanced by others [16].Degree distributions for English PNNs separately constructed from short and long (in phonemes) words showed different shapes and, at least for short words, displayed markedly less power-law behavior.
In Figure 3 we show that this result also holds for the CLEARPOND English corpus, as well as for Dutch, German, Spanish, and French.We divided all words in each corpus into two classes: monosyllabic and polysyllabic.
Figure 3 clearly shows that connectivity among only monosyllabic words differs from polysyllabic word connectivity.The monosyllabic degree distributions look less like power laws than do the polysyllabic degree distributions, and monosyllabic words are in general more densely connected than are polysyllabic words.This raises the possibility that the PNN degree distribution may arise as a mixture of distributions.In all five languages, networks formed from polysyllabic words have degree distributions that are much closer to (truncated) power laws than are the monosyllabic word networks.In addition, note that (with the exception of French) the polysyllabic degree distributions are much more similar across the five languages than the monosyllabic graph degree distributions or those of the full graphs (see Figure 1) In Appendix A, we look more closely at phonological neighbor graphs formed exclusively from monosyllabic or polysyllabic words, and compare them to graphs containing all words in each corpus (see Table VI).We found that some of the full PNN topological properties are present in both the monosyllabic and polysyllabic networks (e.g., degree assortativity and clustering coefficient).However, others are markedly different or disappear.The component or "island" size distribution P c is driven entirely by the polysyllabic words; the monosyllabic words are almost completely connected (an unsurprising outcome of the DAS rule; shorter words, such as cat, are much more likely to have DAS neighbors than long words like catapult).The full PNN graphs have short (∼ 7) average path lengths primarily because the monosyllabic graphs have extremely short average path lengths (∼ 5) and the polysyllabic graphs have long (∼ 10) ones.When we compare the local properties of the monosyllabic words in both the monosyllabic and full graphs, numbers of neighbors and second neighbors are highly correlated.However, clustering is more weakly correlated, indicating that explanations of latencies in SWR that appeal to node clustering [6] coefficient as a predictor may be quite sensitive to whether or not polysyllabic words were included as items in the experiment.
At least three questions remain.First, do constraints imposed by the one-step neighbor DAS similarity measure explain the apparently universal topological features seen across all five languages?If so, what explains the observed differences in the degree distributions in Figure 1? Finally, how much lexical structure is required to generate PNNs that resemble those of real languages?In what follows, we address these three questions in detail.

IV. PSEUDOLEXICONS
Figure 3 and additional results that we present in the Appendix A suggest that the truncated power law behavior observed in the five PNNs might be the result of mixing subgraphs with different connectivity properties.The left panel of Figure 4 again shows the degree distributions for the five languages, this time with all homophones removed.We discuss homophones in detail in Appendix B; in brief, we remove homophones because our random lexicon models produce phonological forms (rather than written words) directly and cannot properly account for homophones.The right panel shows the distribution P l of words of length l phonemes.The P l distributions are underdispersed relative to Poisson (not shown); note also that they are all zero-truncated, as there are no words in any language that consist of zero phonemes.A particularly intriguing feature of the five language P l is that they cluster similarly to the degree distributions shown in Figure 1.English and French together, then German and Dutch, and Spanish by itself.While this could be entirely coincidental or a result of previously undetected cross-linguistic similarities, below we will show that it is not.

A. Models
To determine which topological features of the PNNs arise due to specific features of real languages and which are driven purely by the DAS connection rule, we adopt and extend an approach inspired by previous work on the English PNNs [16].We generate corpora of random phonological forms using generative rules that include varying amounts of real linguistic detail.We denote such a corpus of random strings of phonemes a pseudolexicon.Each pseudolexicon is paired with a target language, since all the models use some information from the real language for construction.Specifically, pseudolexicons are created from the phonemic inventory of each language (the set of all phonemes that occur in the language), with lexicon size constrained to be approximately the same as the real-language lexicon for the target language (for example, about 22,000 unique words [i.e., excluding homophones] for English CLEARPOND), and with the same form length distribution as the target language.To match the length distribution, the length of each random string is first specified by drawing a random integer from a form length distribution P l defined on the positive integers excluding zero.In all cases, the pseudolexicon has a form length distribution which we specify.Specifically, we consider the following six models for pseudolexicons.We have named the models using terminology taken from the Potts [28] and Ising [29] models.Each includes progressively greater language-specific detail relating to phonological structure.We expected that we would get better a successively better match to a given target-language PNN as we included more detail.
• Infinite Temperature (INFT).Each phoneme in the string is drawn uniformly from the target language's phoneme inventory.
• Noninteracting, Uniform Field (UNI).Each phoneme in the string is drawn randomly using its observed frequency in the real language's lexicon.
• Noninteracting, Consonant/Vowel Uniform Field (CVUNI).Each position in the random string is either a consonant or a vowel drawn randomly using observed positional consonant/vowel frequencies in the real lexicon.Specifically, we use the real language's corpus to compute the position-dependent probability that position l is a consonant or a vowel.The particular consonant or vowel placed at that position is drawn uniformly from the respective set of items (lists of consonants and vowels).
• Noninteracting, Consonant/Vowel Field (CV).Positions are selected to be consonants or vowels exactly as in CVUNI.The particular consonant or vowel placed at each position is selected using observed frequencies of consonants and vowels from the real lexicon.
• Noninteracting, Spatially Varying Field (SP).Each phoneme is drawn randomly from real positional frequencies in the target lexicon.For example, if a language has an inventory of twenty phonemes, we use the real lexicon to compute a π l,x that gives the probability that phoneme x occurs at position l, and then use this table to assign a phoneme to each position of the random string.
• Nearest Neighbor Interactions (PAIR).The first phoneme in each string is drawn using a positional probability.Subsequent phonemes are drawn via the following rule.
If the phoneme at position k is x, then the phoneme at position k + 1 is drawn using the empirical probability (from the real lexicon) that phoneme x follows phoneme x.
We have listed the models in rough order of complexity; INFT uses the least amount of information about the real language's structure and PAIR the most.We note that while it is possible to generate real words (particularly short ones) from the models above, the vast majority of the strings produced bear no resemblance to real words in any of the five languages.The only model that avoids unpronounceable diphones is PAIR; in the other models unpronounceable diphones occur frequently.
For each pseudolexicon, we discarded any duplicate items.This is why we removed homophones from the real languages; we did not generate orthographic tags for the random phonological forms, so duplicated forms in the pseudolexicon all represent a single node.We then formed a pseudo-PNN by using the DAS rule to connect items in the pseudolexicon to one another.As with the real PNNs, before any analysis we discarded nodes in the pseudo-PNNs with degree zero.Figure 5 shows the degree distribution of the Francis & Kucera 1982 English corpus (FK) [30] and its six corresponding pseudolexicons.We first show the fit to FK, rather than CLEARPOND English, due to our ability to better control the contents of the FK corpus (see Appendix B) for details).Each of the six pseudolexicons had as its input P l the empirical English P l (e.g. Figure 4, right panel).We note that, while the sizes of the pseudolexicons were fixed to the real-language target lexicon, once the pseudonetworks are formed, they may have fewer nodes than this, since many pseudowords may be neighborless and hence not appear in the graph.

B. English Networks
Figure 5 shows that even minimal levels of linguistic realism yield a pseudo-PNN with a stikingly similar degree distribution to the real English PNN.Even UNI, which includes nothing beyond overall phoneme frequencies and the empirical P l , looks quite similar to FK.
Table II, which lists the same topological properties we previously showed in Table I tells an even more compelling story.First, the putatively lingustically relevant topological properties discussed earier -high clustering, short mean path length, high degree assortativity, and (to some extent) small giant components -are present in all of the pseudolexicons whose degree distributions match that of FK.Giant component size is the least well-matched property in all of the models, though it is still smaller than observed in many real-world networks.
Furthermore, even INFT, in which degree distribution (and hence mean degree) is a poor match to FK, has high clustering, short mean path length, and high degree assortativity.
INFT includes almost nothing about the target language except the form length distribution and the phoneme inventory.We also note the noisiness in the degree distribution of INFT.
While one might hypothesize that this is a result of its relatively small size, the degree distribution of INFT does not become smooth even for larger (10,000 node) graphs (not shown).All rows of the table are as described in Table I.
Figure 6 and Table III shows the same information for the CLEARPOND English database and pseudolexicons matched to it.We note first that all the conclusions that held for FK hold for CLEARPOND English.Again, even a model as naive as UNI has a very similar degree distribution to the real English PNN and very similar topological characteristics.INFT, again despite having a degree distribution that is an extremely poor match to English CLEARPOND, has high clustering coefficient and high degree assortativity.Compared to FK, some differences are evident.Chiefly among them is that all the models now have too low of a mean degree, arising because the model degree distributions have large-k tails that are too short.However, given the analysis and discussion in Appendix B, this is to be expected.As discussed there, our models do not include analogs to inflected forms (e.g., WALK, WALKS, WALKED).We also have not attempted to model homophones (which have been removed in our pseudo-lexicon PNNs) or proper nouns.All three of these item types preferentially affect the tail shape of P k .We also note that the CLEARPONDmatched pseudolexicons tend (except for SP) to undershoot the English giant component size, though they still match the fundamental observation that the GC is a relatively small portion of the full network.

C. Five Language Pseudonetworks
We now compare pseudo-PNNs to real PNNs for all five languages: English, Spanish, Dutch, German, French.For this comparison, we used only the UNI model, since it has a very similar degree distribution to the English PNN P k despite containing almost no information about real language phonology and constraints.In each case, the pseudo-PNN is matched in total corpus size and form length distribution to its target language.The left panel of Figure 7 shows the true degree distributions for the five language PNNs (shown also in Figure 4) and the right panel of Figure 7 shows the pseudo-PNNs using the UNI model.Furthermore, Table IV shows topological parameters for Spanish, French, German, and Dutch and their matched UNI pseudo-PNNs.We omit English in Table IV because that information is contained in Table III.
Figure 7 and Table IV together show that, as in English, the UNI model is able to come remarkably close in shape and topological properties to the real phonological neighbor networks, despite not resembling the real language's phonology in any way.The clustering of the five language degree distributions for the pseudo-PNNs mimics that seen in the real PNNs, particularly in the manner in which Spanish is separated from the other languages.
Given the way the UNI pseudo-PNNs were constructed, this grouping must be driven entirely by the form length distribution.In Figure 2 we showed that the component size distributions for all five language PNNs follow a power law, even moreso than the degree distributions for the PNNs themselves.
This has previously only been observed in English and Spanish [25].However, even these component size distributions do not arise out of any fundamental or universal phonological properties.In the left panel of Figure 8 we reprint Figure 2 to allow easy comparisons.In the right panel we show component size distributions for the five pseudo-PNNs.While the span of P c is somewhat reduced in the pseudo-networks, all the pseudographs clearly have power law size distributions with exponents similar to their target languages.Thus, even the island size distribution is essentially an artifact of the neighbor definition.

D. Sensitivity to the Form Length Distribution
The previous section demonstrates that the topological properties of phonological neighbor networks constructed using the one-step DAS rule are driven not by any real linguistic feature but by the connection rule itself.While the resulting PNNs are remarkably insensitive to the degree to which real phonological constraints are used in their construction, we distribution used to produce the pseudolexicon yielding the PNN in the main panel.Table V compares the topological properties of those four pseudo-PNNs to FK and each other.
It is clear from Figure 9 that the shape of the PNN degree distribution is extremely sensitive to the form length distribution.Even the relatively small differences in the shape of EMP and ZTP(1x) lead to large changes in the tail mass of the degree distribution.
The difference between the degree distribution of ZTP(1x) and ZTP(1.5x) is similar to the difference between the degree distributions of English or French and Spanish (see Figure 4).
In addition, Table V shows that the PNN made from ZTP(1.5x) is much smaller (fewer nodes and edges) than any of the other models.This is expected given the reduction in probability of short phonological forms in ZTP(1.5x) when compared to EMP, ZTP(1x), or GEO; the probability that two strings from the UNI pseudolexicon that differ in length by one unit or less are neighbors decays exponentially with string length.Note also from Table V that no matter what effect P l has on the degree distribution of the resulting PNN, all graphs show high clustering coefficients, short mean free paths, and high degree assortativity.

V. CONCLUSION
We have shown that observed "universal" topological features of phonological neighbor networks [12] -truncated exponential degree distributions, high clustering coefficients, short mean free paths, high degree assortativity and small giant components -are string rather than language universals.That is, inferences from networks based on similarity regarding language ontogeny or phylogeny are suspect, in light of our analyses demonstrating that similar network structures emerge from nearly content-free parameters.One might object to this strong interpretation.The DAS rule obviously captures important relations that predict significant variance in lexical processing due to similarity of phonological forms in the lexicon.Networks based on DAS are able to extend DAS's reach, as was previously demonstrated with clustering coefficient [6,7].Note, though, that clustering coefficient relates to familiar concepts in word recognition that have not been deeply explored in the spoken domain: the notion of neighbors that are friends or enemies at specific positions, discussed by McClelland and Rumelhart in their seminal work on visual word recognition [31].
Consider a written word like make, with neighbors such as take, mike, and mate.Take is an enemy of the first letter position in make, but a friend at all other letter positions, where it has the same letters.A written word with a clustering coefficient approaching 1.0 would have many neighbors that all mismatch at the same position (thus making them neighbors of each other).A word with a similar number of neighbors but a low clustering coefficient (approaching N/L, that is, N neighbors evenly distributed of L [length] positions) would have more evenly distributed neighbors.For spoken word recognition, the results of Chan and Vitevitch [6] suggest that a high clustering coefficient exacerbates competition because it is heavily loaded on a subset of phoneme positions, creating high uncertainty.In our view, this reveals important details about phonological competition, but not ontogeny or phylogeny of English, or other specifically linguistic structure.Indeed, given the similarity in the distribution of clustering coefficients (among other parameters) in English and in our abstract PNNs, we interpret instances of (e.g.) high clustering coefficient as string universals rather than language universals.
While phonological neighbor network topology is largely insensitive to the degree of real phonological structure in the lexicon used to construct the neighbor network, we found some amount of sensitivity to the input form length distribution P l .Even relatively subtle changes in P l can lead to observable changes in the degree distributions of the resulting neighbor networks, and differences among the five languages we studied here can be almost wholly attributed to differences in form length distributions among the five languages.However, even this sensitivity is only partial.Form length distributions that look nothing like any of the languages we consider here (GEO, although GEO may partially resemble the P l of a language like Chinese), that generate network degree distributions that we do not observe, still yield high clustering coefficients, short mean free paths, and high degree assortativity.
The question of what leads to a given language's P l is a question about language evolution that will be much more difficult to explain, though some parallels might be drawn with work that seeks to understand the evolution of orthography [32][33][34].
At an even deeper level, it may be perilous to attach too much meaning to the topology of any similarity network of phonological forms, at least with respect to human performance in psycholinguistic tasks.This is because these networks do not "do" anything; they have no function.They are not connectionist networks that attempt to model phoneme perception, like TRACE [35] or TISK [36].No matter how they are constructed, they are basically static summaries of the structure of the speech lexicon; they do not perform a processing function.
Insofar as the similarity measure aligns with latency data from human spoken words tasks (e.g.picture naming [7], lexical decision [6], etc.), network properties may encode some features of human performance.While there is evidence that some aspects of human task performance may be predicted from features of neighbor networks [3,[6][7][8][9], it is clear from our study that care must be taken in interpreting the results of studies of phonological networks.
If the static structure of the lexicon were to be paired with a dynamics that represents mental processing, it would be possible to test the utility of phonological similarity networks for explaining human performance in psycholinguistic tasks.We also compared node-level topology for the MS words in the MS only graph and the full PNN (MS+PS).Most quantities are almost perfectly correlated for these two: these include number of neighbors (degree), number of second neighbors, and eigenvector centrality.All of these quantites are highly correlated with R 2 ≥ 0.95.Node clustering coefficient for the MS words in the two English graphs is more weakly similar (R 2 = 0.8), with large outliers (see Figure 10).It would be interesting to revisit the proposed relationship between node clustering and spoken word recognition [6] facility in light of these findings.
When we performed the same syllable-level calculations for the other three languages in the CLEARPOND database, we find a consistent story (results not shown proper nouns for all five languages in CLEARPOND. • Inflected Forms.FK includes lemma numbers for all the words, so we can simply remove any words that are not lemmas.We do not have this information for any words in CLEARPOND and thus cannot remove them.To try to remove inflected forms in CLEARPOND we could, for example, remove all words with word-final phonological 'z'.This would remove English plurals but also improperly remove some lemmas (size).Even if this were desirable, we would need different rules for all five languages.
Therefore we are forced to keep all inflected forms in the CLEARPOND PNNs.
• Homophones Homophones are items with identical phonological transcriptions but different orthography.These are relatively simple to remove in both FK and CLEAR-POND English, and the same procedure works in any language.We search the nodes for sets of items with identical phonological transcriptions.For example, see and sea would comprise one homophone set in English, and lieu, loo, and Lou another.One of the items from each homophone set, chosen at random, is kept in the PNN and the nodes corresponding to all other items in the set are deleted.
Figure 11 shows the degree distribution of the FK PNN when inflected forms, proper nouns, and homphones were successively removed.Two features of this figure deserve mention.First, the main effect of these classes of words is in the tail of the degree distribution.
Secondly, removal of inflected forms causes very little change compared to removal of proper nouns and homophones.It is relatively easy to understand why the largest changes to the degree distribution occur at large k, at least for homophones.Consider a single orthographic form w that is also a homophone with degree d.All of the other orthographic forms in its homophone set are connected to both w and all of the d neighbors of w.If there are N words in the homophone set, we end up with N nodes each with degree d + N − 1.Thus, homophone sets can boost the degree of both their neighbors (since a neighbor of one is a neighbor of all other words in the set) and the homophones themselves.As an example, a homophone set of size 10 in which one of the words has 10 neighbors yields 10 nodes with degree 19.Removing members of the homophone set will therefore tend remove nodes of large degree and therefore shift the tail of P k .
Figure 12 compares removal in English CLEARPOND to FK.We first note that, despite being based on completely different corpora, the unaltered English CLEARPOND and FK yield similar PNNs.In addition, as in FK, removal of homophones and proper nouns in CLEARPOND tends to truncate the tail of the degree distribution.As we noted above, the only class of words that we can consistenly remove from all five CLEARPOND languages is homophones, and we remove these for all model comparisons.The number of homophone sets, mean set size, and the number of nodes removed from the graph when the removal procedure described above is implemented, for each of the five languages in CLEARPOND is shown in Table VIII

FIG. 4 .
FIG. 4. The left panel shows the degree distributions P k versus k for the five CLEARPOND PNNs.Compare to Figure 1; this figure differs because homophones have been removed from the graphs as detailed in Appendix B. The right panel shows the distribution P l of phonological form lengths in each of the five languages from the CLEARPOND corpora.Note that all these distributions are only defined for l ≥ 1; length zero words do not exist.

FIG. 9 .FIG. 10 .
FIG. 9. Degree distributions (main panel) for UNI pseudo-PNNs constructed using four different phonological length distributions: the empirical English form length distribution (EMP), a zerotruncated Poisson fit to the empirical distribution (ZTP(1x)), a zero-truncated Poisson with shifted mean (ZTP(1.5x)),and a geometric distribution (GEO) with the same mean as EMP.The real FK network is shown for comparison, and the inset shows the four different form length distributions.

TABLE III .
Topological measures for the CLEARPOND English corpus (EN) and six pseudolexicons matched to it.All rows of the table are as described in TableI.
TableIVshows that the pseudo-PNNs match their target languages quite well overall, with some properties extremely similar, e.g.clustering coefficients and degree assortativity.
TABLE IV.Topological measures for four phonological neighbor networks (FR, ES, DE, NL) and matched UNI pseudo-PNNs (pFR, pES, pDE, pNL).All rows of the table are as described in Table I.

TABLE V .
Topological measures for four UNI pseudo-PNNs (EMP, ZTP-1X, ZTP-1.5X,GEO) and the real FK phonological neighbor network.All rows of the table are as described in TableI.

TABLE VI
. Topological measures for graphs produced from the CLEARPOND English and Dutch corpora.MS+PS is the full PNN (see also TableI), MS is a graph formed from only the monosyllabic words, and PS a graph formed from only the polysyllabic words.With the exception of edge density d and frequency assortativity coefficient r f , all symbols in this table are the same as those in TableI, and the quantities in the tabhle separated by forward slashes have the same meaning as in TableI.Edge density is defined as 2m/N (N − 1), where m is the number of edges and N the number of nodes in the graph.

TABLE VII
).In all cases, MS giant component sizes are much larger than PS GC sizes, MS edge densities are close to tenfold larger, and MS mean geodesic path lenghts are much shorter.PS degree distributions .The thirteen words in English CLEARPOND with the highest degree.Note the prevalence in this list of (i ) proper nouns and (ii ) homophones (e.g.,see,sea).

TABLE VIII .
. TableVIIIindicates that for all languages except French, the majority of homophone sets are pairs like see and sea.Number of homophone sets N H , mean homophone set size µ H and the number of nodes removed from the CLEARPOND PNNs when homophones are removed.Note the wide variation in the number of homophones across the five languages.