Next Article in Journal
The Price Equation Reveals a Universal Force–Metric–Bias Law of Algorithmic Learning and Natural Selection
Previous Article in Journal
Kicked General Fractional Lorenz-Type Equations: Exact Solutions and Multi-Dimensional Discrete Maps
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Interactions Among Morphology, Word Order, and Syntactic Directionality: Evidence from 55 Languages

1
School of International Studies, Zhejiang University, Hangzhou 310058, China
2
College of Foreign Languages and Literature, Fudan University, Shanghai 200437, China
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(11), 1128; https://doi.org/10.3390/e27111128
Submission received: 26 August 2025 / Revised: 27 October 2025 / Accepted: 29 October 2025 / Published: 31 October 2025
(This article belongs to the Section Complexity)

Abstract

This study investigates interactions among morphology, word order, and syntactic directionality across 55 languages from 11 families. We quantify morphological richness (moving-average mean size of paradigm), word order flexibility (entropy), and syntactic directionality (dependency direction), linking linguistic structure to information-theoretic principles. Analyses show that morphological richness is only weakly related to word order entropy and does not provide a robust predictor after statistical correction. Rich morphology facilitates the predictability of syntactic functions. Languages with richer morphology consistently favor head-final structures, whereas minimally inflected languages lean toward head-initial patterns, indicating that syntactic directionality is more closely associated with morphological complexity than with surface word order. Overall, the findings indicate that languages maintain a balance between redundancy and flexibility in optimizing information transmission, providing quantitative evidence for efficiency-driven trade-offs in human language.

1. Introduction

Human language is a structured yet adaptive system governed by underlying regularities [1]. A central principle in linguistic typology is the complexity trade-off, which posits that when a language develops elaborate features in one domain—such as phonology, morphology, syntax, or semantics—it tends to simplify another, thereby maintaining an overall equilibrium in communicative complexity [2,3,4,5]. Finnish provides a canonical example of this trade-off. As a highly inflected language, it conveys dependency relationships through rich morphological markers, permitting flexible word order. For instance, the sentence “The boy reads a book” can be expressed in multiple orders: SVO (Poika lukee kirjan), SOV (Poika kirjan lukee), VSO (Lukee poika kirjan), VOS (Lukee kirjan poika), OSV (Kirjan poika lukee), and OVS (Kirjan lukee poika) (SVO: Poika lukee kirjan = poika.NOM boy, luke-a.3SG.PRS read, kirja-ACC book → ‘The boy reads the book.’ SOV: Poika kirjan lukee → ‘The boy reads the book.’ (object focus). VSO: Lukee poika kirjan → ‘It is the boy who reads the book.’ VOS: Lukee kirjan poika → ‘The boy reads the book.’ (subject focus). OSV: Kirjan poika lukee → ‘The book, the boy reads.’ OVS: Kirjan lukee poika → ‘It is the boy who reads the book.’). Thai, a Tai-Kadai language with minimal inflection, relies primarily on fixed word order and contextual cues to convey grammatical relations, following an SVO pattern: เด็กชาย อ่าน หนังสือ (dèk-chaai àan năng-sĕu; boy.NOM read.PRS book.ACC).
Recent advances in natural language processing and the availability of large-scale dependency treebanks have enabled quantitative investigation of cross-linguistic patterns [3]. Prior research has examined interactions among linguistic subsystems, including the interplay between word class shifts and morphological marking [6], the effect of morphology on lexicalization of speech act markers [7], the relationship between word order and case marking [8,9], the connection between syllable structure, phonemes, and morphology [10], the influence of verb-final order and case marking [11], as well as the impact of verb order, semantic closeness, and cognitive load [10,11,12]. These studies highlight that word order emerges from complex interactions among morphology, syntax, and information-processing constraints.
Within this framework, Hawkins’ efficiency-based theory of grammar offers a processing perspective on word order optimization. It posits that grammars tend to select and arrange linguistic forms so as to provide the earliest possible access to the developing syntactic–semantic representation during incremental parsing [13]. This Maximize Online Processing (MOP) principle links grammatical organization to processing efficiency, suggesting that constituent ordering reflects pressures to minimize integration cost and maximize early structure building. Building on this view, the present study investigates how morphological richness and dependency directionality interact under such processing efficiency constraints. Empirical evidence by Sinnemäki & Haakana [14] supports this perspective: in genitive noun phrases, richer morphological marking correlates with shorter dependency distances. However, their analysis was restricted to noun phrases and did not consider other syntactic constituents such as adjectives, determiners, relative clauses, adverbs, or full clauses. Consequently, the broader impact of morphological richness on syntactic directionality and word order flexibility remains underexplored.
The present study addresses this gap using 80 UD treebanks representing 11 language families. Specifically, we examine three variables: morphological richness (moving-average mean size of paradigm), word order flexibility (entropy), and syntactic directionality (dependency direction). The study is guided by two research questions:
RQ1: Does morphological richness correlate with word order flexibility? Rich morphology may provide cues that facilitate dependency parsing, potentially allowing greater variation in word order.
RQ2: Does morphological richness influence syntactic directionality? Languages with richer morphology may favor specific structural patterns, such as head-final or head-initial configurations, optimizing the transmission of information while reducing cognitive load.
By quantifying these relationships across a large typologically diverse sample, this study aims to provide quantitative evidence for efficiency-driven trade-offs in human language, shedding light on how morphological complexity interacts with syntactic structure to balance redundancy and flexibility.

2. Methodology

2.1. Data

The data used in this study comprised 55 languages, spanning 22 language branches across 11 language families. These families included Indo-European, Dravidian, Uralic, Altaic, Afro-Asiatic, Sino-Tibetan, Austronesian, Niger-Congo, and South Asian, as well as Basque and Japanese. Detailed information on the linguistic characteristics and the corresponding treebanks can be found in Appendix A and Appendix B. In Semitic languages such as Arabic and Hebrew, prepositions are often orthographically attached to nouns, forming a single word. However, for the purposes of dependency syntactic analysis, such constructions should be segmented into two distinct tokens, i.e., one representing the preposition and the other the noun. To ensure consistency and cross-linguistic comparability, all treebanks in this study employed the Universal Dependencies (UD) annotation framework [15]. UD systematically separates clitics and treats them as independent tokens, thereby minimizing errors arising from orthographic variation across languages. The UD framework provides part-of-speech tags, lemmas, and syntactic dependency trees for each word in the corpus. Each word is linked to a single syntactic head, and dependency relations are annotated using standardized labels such as nsubj (nominal subject), obj (direct object), and amod (adjectival modifier), among others. All metric computations and statistical analyses, including correlation tests between variables, were implemented using the Python programming language (version 3.11).

2.2. Metrics

This study evaluated three syntactic and morphological dimensions: morphological richness, word order flexibility, and syntactic directionality.

2.2.1. Metric for Morphological Richness

A wide range of corpus-based metrics has been developed to capture morphological richness, including type–token ratio, word and lemma entropy, information-theoretic measures, and paradigm-based counts [16]. While each metric offers a distinct perspective, many are sensitive to corpus size, structural domains (e.g., verbal vs. nominal), or text genre, making cross-linguistic comparison difficult. To address this, the present study adopts a paradigm-oriented measure inspired by Xanthos and Gillis [17]: the mean size of paradigm (MSP), which captures the average number of inflected forms per lemma. To enhance robustness across corpora of varying lengths and syntactic densities, we further refined this metric using a moving-average framework, resulting in the moving-average mean size of paradigm (MAMSP). This approach mitigates the impact of local fluctuations by computing paradigm size within fixed-size token windows. Formally, MAMSP is defined as follows:
(1)
MAMSP = i = 1 N W + 1 F i L i N W + 1
where N refers to the number of text tokens in the data. W represents the window size, where W < N. Fi denotes the number of distinct inflected word forms in each window. Li represents the number of distinct word lemmas in each window. To illustrate, consider the following Japanese sentence:
(2)
朝早く駅に着いて彼女[が omitted]来るのを
Morning early ADV station-DAT arrive-GER she-NOM [omitted] come-NMLZ-ACC
待ったけど、電車[が omitted]遅れて会えなかった.
wait-PST CONJ train-NOM [omitted] late-GER meet-POT-NEG-PST
“I arrived early at the station and waited for her to come, but the train was late, so I could not meet her.”
This sentence contains 11 tokens (N = 11). Suppose the window size is W = 5. Then, we can slide the window to obtain N – W + 1 = 7 windows: 朝, 早く, 駅に, 着いて, 彼女; 早く, 駅に, 着いて, 彼女, 来るのを; 駅に, 着いて, 彼女, 来るのを, 待った; 着いて, 彼女, 来るのを, 待った, けど; 彼女, 来るのを, 待った, けど, 電車; 来るのを, 待った, けど, 電車, 遅れて; 待った, けど, 電車, 遅れて, 会えなかった. For each window, compute the ratio Fi/Li, i.e., the number of distinct inflected word forms divided by the number of distinct lemmas. The final MAMSP is the average of these seven ratios:
(3)
MAMSP = i = 1 N W + 1 F i L i N W + 1 = 1 2 + 1 3 + 1 3 + 1 2 + 2 3 + 1 2 + 1 1 11 5 + 1  ≈ 0.548
This moving-average approach smooths local fluctuations and provides a stable estimate of morphological richness across different corpus lengths.

2.2.2. Metric for Word Order Flexibility

Word order is influenced by a multitude of linguistic factors, including adpositions, genitives, noun modifiers, demonstratives, numerals, adjectives, relative clauses, and adverbs, among others [18]. For the purpose of this study, our attention is centered on the sequence of the constituents: subject (S), object (O), and verb (V). This choice is justified by two main considerations: (a) S, O, and V are central syntactic elements that appear in the vast majority of sentences across different dialects, making them ideal for typological comparison, and (b) their identification is relatively straightforward across varied language datasets [19]. We considered the six canonical word orders, i.e., SVO, OVS, VSO, VOS, SOV, OSV, as well as partial orders such as SV, VO, and SO, which arise due to intransitive clauses, ellipsis, copula-drop, or nominal predications. Our Python-based tool extracts these linearizations directly from dependency structures, allowing the analysis to accommodate varied syntactic patterns and reflect actual language use. While some languages demonstrate strict word order constraints in subordinate clauses or coordinated structures [20], this study focuses exclusively on main clauses.
In measuring word order flexibility, several metrics have been proposed, such as maximum–minimum distance, Euclidean distance, cosine similarity, and entropy [19]. Among these, Kubon et al. found a convergence in their ability to capture variability, suggesting any of these can serve as reliable indicators. Our study adopts entropy as the principal metric due to its interpretability and prevalence in recent linguistic research on syntactic variation [6,21]. Entropy (ENTR), the metric selected for this study, captures the randomness or unpredictability of word order use [22]. It is defined as follows:
(4)
ENTR = i = 1 6 s i   l n   s i
s i denotes the relative frequency of the i-th word order type, including both canonical (SVO, OVS, etc.) and partial orders (SV, VO, SO, etc.), with s i   0 and i s i  = 1. The index iii thus ranges over all attested word order variants. Entropy reaches its maximum when all word orders are equally probable, yielding ln 6 ≈ 1.794, and decreases as distributions become more skewed toward fewer dominant orders. By employing entropy in this way, our metric captures the degree of flexibility in the linear arrangement of syntactic elements, accommodating both complete and partial clause structures.

2.2.3. Metric for Syntactic Directionality

Syntactic directionality reflects the linear order between syntactic heads and their dependents. Cross-linguistically, some languages tend to favor head-initial structures, where the head precedes its dependent (as in English), while others prefer head-final structures, such as Japanese. Many languages display mixed or flexible ordering patterns [23,24]. In this study, we operationalized syntactic directionality in terms of dependency directionality. To capture this structural property, we adopted the dependency direction (DDir) score [20,23], a quantitative metric that summarizes the global syntactic ordering tendency of a language. It was computed over all dependency relations in a syntactically annotated corpus (treebank). For each dependency arc, we determined whether the head precedes or follows its dependent and then calculated the ratio between head-initial and head-final dependencies. Formally, the DDir score is defined as follows:
(5)
D D i r = N H I N H F N t o t a l
This definition normalizes the difference between head-initial (NHI) and head-final (NHF) dependencies by the total number of dependencies (Ntotal), yielding values between −1 and +1. A score close to +1 indicates a strongly head-initial language, a score near −1 suggests a predominantly head-final language, and values around 0 imply mixed directionality.
It is important to note that the DDir measure is based on head-dependent direction in the dependency tree, not on surface word order. For example, in English, both the active sentence ‘I ate an apple’ and its passive counterpart ‘An apple is eaten by me’ have the same head-dependent directions: the verb is the head, and both subject and object (or agent in the passive) are dependents. Therefore, passive constructions do not affect the dependency direction statistics in our UD-based analysis.
Furthermore, to assess the robustness of the three metrics across different data sources (treebanks), we conducted a pilot test by calculating MAMSP, ENTR, and DDir using multiple treebanks for three languages: German, Tamil, and Turkish. We compared German-HDT vs. German-GSD, Tamil-TTB vs. Tamil-MWTT, Turkish-Kenet vs. Turkish-Boun. Results revealed moderate variation. For German, DDir (0.6356 vs. 0.6576) and MAMSP (0.982 vs. 0.9622) are relatively close, indicating consistency in morphological annotation and dependency structure. ENTR shows greater divergence (1.157 vs. 1.3439), suggesting that syntactic variability is more corpus-sensitive. For Tamil, MAMSP remains stable (0.9995 vs. 0.9763), but DDir (0.6463 vs. 0.7105) and especially ENTR (0.1391 vs. 0.6663) vary substantially, implying differences in how word order patterns are captured. For Turkish, the two treebanks produce relatively consistent values across all three metrics, with only minor differences. These findings indicate that morphological richness is generally stable across treebanks, while word order flexibility is more sensitive to genre, domain, and annotation guidelines. Dependency relation directionality (DDir) shows moderate variation. In light of this, we adopted a triangulation strategy to improve the reliability of our linguistic measurements. For languages with multiple treebanks, we computed each metric separately for each source. If results were consistent, we treated them as robust. If they diverged, we calculated the mean across treebanks to reduce bias.

2.3. Units of Analysis

The present study investigates cross-linguistic relationships among morphological richness, word order flexibility, and syntactic directionality. The intended level of inference is typological (i.e., across languages), whereas the empirical level of observation is corpus-based. To balance representativeness and statistical independence, each UD tree bank is treated as a single analytical unit. As summarized in Appendix B, some languages (e.g., German, Japanese, Turkish) are represented by multiple treebanks differing in domain and genre. Treating each treebank as one observation prevents languages with multiple treebanks from being overrepresented and avoids within-language autocorrelation. Accordingly, all statistical analyses (including ANOVA and Pearson correlation tests) were conducted using per-treebank mean scores as independent data points. Each tree bank contributed one averaged value per metric (MAMSP, ENTR, and DDir).

3. Results

3.1. Morphological Richness

Appendix C summarizes the MAMSP values for each language, where higher scores indicate greater morphological richness. Figure 1 illustrates both the raw MSP values (blue) and their moving averages (red).
A one-way ANOVA tested differences in MAMSP across 22 language branches, using per-treebank mean MAMSP scores as independent data points. The analysis revealed a significant effect of language branch on morphological richness (F (21, 33) = 2.470, p = 0.0097, η2 = 0.611, ω2 = 0.360). Based on MAMSP (the smoothed measure), the ten morphologically richest languages are predominantly agglutinative, including Uyghur and Kazakh (Turkic branch, Altaic family), Marathi (Indo-Aryan, Indo-European), Finnish and Estonian (Finno-Ugric, Uralic), Buryat (Mongolic, Altaic), Kurmanji (Iranian, Indo-European), Tamil and Telugu (Dravidian), and Wolof (Atlantic, Niger-Congo). These results confirm that agglutinative systems, which express grammatical meaning through concatenated, segmentable morphemes, tend to yield a larger number of distinct surface forms per lemma—hence higher MAMSP values. This correlation aligns with typological findings in the literature [17,25,26,27]. French and Gothic, which rank relatively high in raw MSP but not in MAMSP, illustrate that inflectional irregularities and portmanteau morphemes (e.g., je vais, tu vas) can temporarily inflate raw morphological counts. The MAMSP-based results, therefore, provide a more stable and typologically coherent measure of morphological richness across language families.
Within-family contrasts further support these trends. In the Uralic family, the Finnic languages (Finnish, Estonian) show significantly higher MAMSP scores than Finno-Ugric members (Hungarian, North Sámi): t = −12.83, p = 1.76 × 10−36. In the Indo-European family, Polish is morphologically richer than Ukrainian (t = −3.76, p = 0.00056), and Marathi, the only agglutinative Indo-Aryan language, exceeds Urdu and Hindi in richness. Within the Altaic family, Turkic languages exhibit significantly higher MAMSP values than Mongolic languages (t = 3.77, p = 0.00017). At the lower end of the scale, Chinese and Vietnamese show minimal MAMSP values, consistent with their isolating morphological type. Languages from the Celtic, Basque, Uralic, Baltic, and Slavic branches display moderate morphological richness, while most Romance and Germanic languages are comparatively low. Interestingly, Japanese, despite its agglutinative character, ranks lower than expected. This likely results from its limited nominal inflection—grammatical relations are encoded primarily by independent case particles (が, を, に) rather than by affixal morphology. Because the MAMSP metric quantifies affix-based morphological productivity, Japanese appears less morphologically rich under this measure. This effect is reinforced by the UD annotation scheme, which treats Japanese particles as separate tokens rather than bound morphemes.

3.2. Syntactic Directionality

Languages with greater morphological richness show a tendency toward head-final dependency structures (Pearson r = −0.369, 95% CI [−0.611, −0.085], p = 0.0055). Languages with high MAMSP values, such as Uyghur, Marathi, Turkish, Kazakh, Kurmanji, Buryat, Tamil, and Telugu, exhibit a strong preference for head-final dependencies. Across all treebanks, we analyzed a total of ≈856,000 dependency relations (excluding ROOT). Among these, ≈330,000 were head-final (38.5%) and ≈526,000 were head-initial (61.5%). Figure 2 summarizes absolute counts of head-final occurrences by dependency relation type, aggregated across all treebanks. For interpretability, Appendix D reports, for each relation, both (i) the within-relation head-final percentage and (ii) the share of the full corpus. For example, for nmod, there are 9688 head-final instances, which correspond to 13.84% of all nmod relations (9688/69,981) and about 1.1% of all dependencies in the dataset (9688/856,000). Consistent head-final behavior is observed for case, det, cc, mark, amod, advmod, nsubj, and aux, particularly within morphologically rich (high-MAMSP) languages. By contrast, relations such as obj, obl, acl:relcl, xcomp, and nmod show lower head-final proportions in the aggregate. These distributional patterns align with typological observations that languages with more robust inflectional/agglutinative morphology tend to favor head-final sequencing in core phrasal domains [9,25], while also reflecting cross-branch mixing in relations sensitive to clause structure (obj, obl, xcomp). Overall, head-final dependencies constitute a numerical minority in the aggregated corpus, yet they are typologically concentrated in morphologically rich languages with predominantly head-final syntactic structures.

3.3. Word Order Flexibility

Appendix E reports the ENTR values for all languages, where higher scores reflect greater freedom in constituent ordering. A one-way ANOVA tested branch-level differences in ENTR, using per-treebank mean ENTR scores as independent observations (F (21, 33) = 2.13, p = 0.025). The analysis yielded a moderate-to-large effect size (η2 = 0.576, ω2 = 0.302), indicating that language branch accounts for a substantial share of the observed variation in word order flexibility. The top five most flexible languages are as follows: agglutinative: Wolof (Atlantic), Kurmanji (Iranian); inflectional: Lithuanian (Baltic), Slovak, Czech (Slavic). These are followed by Finnic, Slavic, Germanic, and Basque languages. In contrast, Indo-Aryan, Celtic, Sinitic, and Vietic languages exhibit more rigid word order. Appendix F further provides the distribution proportions of six word-order types across 55 languages. Among these, Lithuanian stands out with a high degree of inflectional variation and the most extensive case system among Indo-European languages. This rich inflectional variation and case system contribute to Lithuanian having the most relaxed word order. Kurmanji, an agglutinative language, ranks second in word order freedom. Czech, Slovak, Slovenian, German, Finnish, and Hungarian demonstrate the most balanced proportions of S, V, and O combinations. The Slavic language family showcases all possible word orders, with preferences in the order of SVO > OVS > VOS > SOV > VSO > OSV. Among the eleven Slavic languages, Czech, Slovak, and Slovenian exhibit the most balanced distribution of word order types. This explains why these three languages have higher ENTR values compared to other Slavic languages. Within the Uralic sample examined here (Finnish, Estonian, Hungarian, and North Sámi), all six canonical word orders are attested in the UD treebanks. Among them, SVO is the most frequent pattern overall, though its proportion varies substantially across languages. Estonian shows a strong SVO preference (≈85%), Hungarian favors SVO (≈59%) but also displays considerable SOV and OVS variation, while Finnish exhibits the most flexible ordering, with SVO ≈ 41% and a notably high proportion of OSV and OVS (≈38%). The Altaic language family exhibits all word orders but has a preference for SOV. The Turkic language family (Turkish, Kazakh, Uyghur) shows a word order preference of SOV (78%) > OVS (15%) > SVO (3%) > OSV (4%) > VSO (1%) > VOS (0%). Within the Altaic language family, Mongolic Buryat shows a higher proportion of SVO (27.45%) and a lower proportion of OVS. This is due to the influence of Slavic languages (dominated by SVO) on the Buryat language, despite it being part of the Mongolic language family and administratively belonging to one of the autonomous republics of the Russian Federation. This explains why the word order preference of Buryat is not related to the Altaic type but clusters with Slavic languages.
The Afro-Asiatic Semitic family exhibits distinct word order preferences. Arabic and Maltese strongly favor verb-initial (VSO) patterns (about 70%), reflecting a consistent syntactic profile. Modern Hebrew shows more variable distributions: fully realized SVO and VSO clauses are relatively infrequent, while a large share of clauses contain only two constituents (SV, VS, VO, or OV). This structural variability contributes to Hebrew’s higher word order entropy compared to Arabic and Maltese. Table 1 and Table 2 summarize the distribution of major word orders in these three languages, based on UD treebank data.
Irish is a verb-initial language, with the VO pattern accounting for 98%. The Afro-Asiatic Egyptian language family’s Coptic language, which is agglutinative, exhibits more word order variation than the Semitic languages in the same family. Marathi, Urdu, and Hindi show slightly more flexibility in word order compared to Arabic, tending towards SOV > SVO > OSV. In the previous analysis of morphological richness, Marathi demonstrated the highest diversity among Indo-Aryan languages, and in this word order test, it exhibits the highest flexibility as well. This observation is consistent with the tendency that languages with richer morphology often display greater flexibility in constituent order. Additionally, the Dravidian languages Telugu and Tamil, belonging to the Dravidian language family, exhibit moderate rigidity in word order and a preference for verb-final structures (cf. SOV > OSV > OVS).
Looking at the Indo-European language family, most of the 34 languages examined in this study show a preference for SVO. However, German, Dutch, Gothic, and Greek display verb-initial word order (VOS, VSO) in interrogative and negative sentences. Within the Romance language family, Galician primarily employs SVO but combines it with a certain proportion of VSO and SOV. Basque is an agglutinative language with a balanced distribution of word orders, preferring SVO (58.84%) > SOV (30.39%) > OSV (4.76%) > OVS (3.97%) > VOS (1.70%) > VSO (0.34%). Armenian, while morphologically similar to Turkish, with suffixes used to indicate grammatical information, differs from Turkish in word order preference. Armenian primarily favors SVO and exhibits all possible word orders. Finnish, due to its rich derivation and inflectional morphology, enjoys relatively free word order. The Germanic language family prefers SVO > VSO > OVS. English, within the Germanic language family, is the least flexible in word order, strongly favoring the SVO pattern.

3.4. Cross-Linguistic Correlations

To synthesize the patterns observed in Section 3.1, Section 3.2 and Section 3.3, we performed Pearson correlation analysis among the three metrics: MAMSP (morphological richness), DDir (dependency direction), and ENTR (word order flexibility). Each UD treebank contributed one averaged value per metric, which served as an independent observation in the cross-linguistic correlation tests. The following correlations were observed: MAMSP vs. DDir: r = −0.370, 95% CI [−0.62, −0.09], p = 0.005. MAMSP vs. ENTR: r = 0.267, 95% CI [0.00, 0.50], p = 0.049. DDir vs. ENTR: r = 0.013, 95% CI [−0.27, 0.30], p = 0.924. After correction for multiple comparisons, the MAMSP–DDir correlation remained significant under both Bonferroni and FDR procedures, while the MAMSP–ENTR trend did not. DDir–ENTR showed no relationship under any threshold.
These results show that morphological richness is negatively correlated with head-initial directionality—languages with richer morphology tend to favor head-final structures—while its positive association with word order flexibility is weak and not statistically robust. No meaningful correlation is found between syntactic directionality and word order flexibility.
To account for potential biases due to genealogical non-independence among languages, we performed a one-per-family bootstrap (100 iterations) for these three pairwise correlations. Figure 3 presents the bootstrap distributions: r_MAMSP_HF: median ~ −0.5, 95% CI mostly negative, confirming a robust negative correlation; r_MAMSP_ENTR: median ~0.5, 95% CI overlaps zero, indicating a non-significant positive trend. r_HF_ENTR: median ~0, 95% CI spans zero, confirming no significant relationship between head-final dependency preference and word order flexibility. These results reinforce that the negative correlation between morphological richness and head-final preference is robust, while positive trends between morphology and word order flexibility are inconclusive, and syntactic directionality and word order flexibility are uncorrelated.

4. Discussion: Interactions Among Morphology and Syntactic Directionality

The central finding of this study is that morphological richness reliably predicts dependency directionality, whereas its association with word order flexibility is weaker. Languages with richer inflectional systems consistently favor head-final dependency structures, and this effect remained significant even under conservative Bonferroni correction, reinforcing the view that morphological richness plays a stable role in shaping dependency directionality across languages. The effect size was moderate (r ≈ −0.37), indicating a substantial but not deterministic influence of morphology on syntactic organization. This result is consistent with Dryer [28] in that our measure of syntactic directionality (DDir) captures the same type of head-directional tendencies he reported for OV and VO languages, for example, the association of OV with RelN and VO with NRel structures. Our analysis builds on Dryer’s typological framework by extending it into a quantitative, corpus-based approach that integrates morphological richness (MAMSP) with syntactic directionality.
The observed correlation between morphological richness and head-finality may reflect a structural alignment between morphological and syntactic organization. In most languages, morphological complexity is largely realized through suffixation, and suffixes are often analyzed as syntactic heads. From this perspective, rich morphology, being predominantly suffixing, can be viewed not merely as an independent correlate but as a morphological manifestation of head-final organization. In other words, the same underlying head-final principle that governs syntactic ordering may also shape morphological structure. This interpretation suggests that the correlation is not accidental but reflects a cross-level structural consistency between morphology and syntax. Future research could empirically test this account by quantifying the cross-linguistic balance between suffixing and prefixing patterns.
The correlation between morphological richness and word order flexibility was weaker and did not remain significant after correction for multiple comparisons. This suggests that morphology may sometimes enable freer constituent order, but such effects are less stable and context-dependent. No evidence was found for a correlation between word order flexibility and dependency directionality. This pattern is broadly consistent with typological observations under the OV/VO contrast, which indicate that while certain structural correlations are robust, others may be region- or language-specific rather than universal. Figure 4 visualizes raw bivariate trends between morphological richness and syntactic directionality, while Figure 5 presents the network of dependencies, showing that morphology exerts a primary influence on syntax, whereas word order flexibility plays a more peripheral role.
While our results largely support Dryer’s observations, it is worth noting that Benítez-Burraco et al. [29] report a contrasting finding, suggesting that there may be no systematic trade-off between morphological and syntactic complexity across languages. The differences between their results and ours may be partly attributable to variations in data sources, methodology, and metrics. Our study relies on annotated treebanks from 55 languages across 11 language families, using per-treebank measurements of dependency direction (DDir), word order flexibility (ENTR), and morphological richness (MAMSP). Benítez-Burraco et al. use the cross-linguistic typological database (WALS), which aggregates language-level features. Furthermore, our metrics quantify syntactic linearity and dependency structures, capturing more subtle interactions between morphology and syntax in naturalistic language use. Their indicators focus on broader typological categories and feature counts, which may not distinguish head-dependent directionality or word order variability. Finally, our analysis applies statistical correlations and ANOVA on per-treebank measures, which may be sensitive to micro-level co-variation patterns. Taken together, the discrepancy likely stems from differences in data granularity and methodological scope rather than genuine theoretical conflict. Within our corpus-based framework, we observe that languages with richer morphology favor head-final dependencies and show a mild, non-significant trend toward greater word order flexibility, which may indicate a subtle interaction between morphological richness and syntactic structure that emerges in annotated corpus data.
Recalling Section 2.2.3, to assess the robustness of the three metrics across different treebanks, we compared German-HDT vs. German-GSD, Tamil-TTB vs. Tamil-MWTT, and Turkish-Kenet vs. Turkish-Boun. The results revealed that word order flexibility is more sensitive to factors such as genre, domain, and annotation scheme. Still, one would argue that for languages with “free word order,” such flexibility is not absolute; word order often carries pragmatic meaning related to information packaging, such as focus, topic, or stylistic considerations. For example, in Russian, which exhibits relatively free word order, the most important or emphasized element is typically placed first or last. A comparable example in English is the sentence “YOU, I do not understand,” which deviates from typical word order to emphasize “YOU.” These patterns may be influenced by genre or content/topic (e.g., poetry vs. news). The differences in genre and content composition within the corpora used in this study may influence the word order metrics. In languages characterized by ‘free word order,’ variation in constituent order often reflects information–structural adjustments, such as the arrangement of focus and topic. Therefore, preferences for word order use vary across genres, such as news reporting, spontaneous spoken dialogue, and poetry, impacting the statistical expression of dependency directionality and word order flexibility. Our analyses were based on the overall data from each treebank and did not distinguish between genres or pragmatic functions. This limitation may have partially obscured the influence of pragmatic factors on the word order metrics. Addressing this issue will require finer-grained analysis. Future studies could examine genre- or pragmatics-specific word order preferences to refine these metrics. While the present study treats each UD treebank as an independent corpus (per-treebank design), future research focusing on intra-language variation, such as stylistic, genre, or register effects, could adopt a per-document framework to capture subcorpus-level dynamics. Such an extension would provide a complementary, fine-grained perspective to the typological approach taken here.

5. Summary and Future Work

This study investigated the interactions among morphological richness, word order flexibility, and syntactic directionality across 55 languages from 11 major families. Using three quantitative metrics, i.e., MAMSP (morphological richness), ENTR (word order flexibility), and DDir (dependency direction), we examined how morphology relates to syntactic structure. Our results indicate that morphological richness strongly predicts dependency directionality, whereas its link to word order flexibility remains weaker and context-dependent. Substantial intrafamilial variation was observed; for instance, Finnic languages (e.g., Finnish, Estonian) exhibit more complex morphology than Finno-Ugric languages (e.g., Hungarian), and Turkic languages generally show richer inflection than Mongolic counterparts.
These findings align with Dryer’s typological generalizations. Consistent with his observation that OV languages often exhibit head-final dependencies, we found a moderate negative correlation between MAMSP and DDir (r ≈ −0.37, p = 0.005). Similarly, the weak trend linking morphological richness to word order flexibility resonates with Dryer’s point that certain structural tendencies, such as RelN/NRel patterns, manifest more strongly in some language types but are not universal. Our corpus-based, per-treebank measures complement Dryer’s genus- and language-level analysis by quantifying effect sizes, capturing confidence intervals, and revealing fine-grained variation in dependency direction and word order.
Several methodological and empirical considerations need attention. First, word order flexibility is sensitive to genre, domain, and annotation scheme, as shown by comparisons across multiple treebanks for German, Tamil, and Turkish. Second, current NLP pipelines differ in tokenization strategies (e.g., Stanza splits the French aux into à + les), which may affect cross-linguistic comparability. Third, the language sample remains typologically imbalanced, with some major families underrepresented, such as Austronesian, Sino-Tibetan, Indo-Aryan, and Niger-Congo, and features such as syllable structure, mean word length, and pragmatic constraints were not incorporated. Future work should address these limitations by expanding typological coverage, integrating genre- and pragmatics-specific analyses, and standardizing tokenization and syntactic annotation frameworks across corpora. Through such efforts, we can deepen our understanding of the interaction between grammar and cognition, and build more robust models of language as a complex adaptive system. Beyond linguistic theory, the findings also have implications: the observed trade-offs between morphology and syntax can inform typological modeling, enhance natural language processing systems by guiding cross-linguistic parser design.

Author Contributions

Writing—original draft, W.L.; Writing—review & editing, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Office for Philosophy and Social Sciences, China, 22BYY186.

Data Availability Statement

The raw data used in this study are openly available from the Universal Dependencies (UD) treebanks at https://universaldependencies.org. All metrics and values reported in this study were computed independently using custom scripts developed by the authors.

Acknowledgments

We thank the reviewers for their constructive feedback, which has improved the methodological clarity and overall coherence of this study.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Language Information

Language FamilyBranchLanguagesMorphologyLanguage FamilyBranchLanguagesMorphology
Indo-European GermanicAfrikaans Fusional Indo-European Slavic CroatianFusional
Indo-European GermanicDutch Fusional Indo-European Slavic Slovenian Fusional
Indo-European GermanicNorwegian-NynorskFusional Indo-European Slavic SerbianFusional
Indo-European GermanicEnglish Fusional Indo-European Slavic Upper SorbianFusional
Indo-European GermanicDanishFusional Indo-European Slavic CzechFusional
Indo-European GermanicGermanFusional Indo-European GreekGreekFusional
Indo-European GermanicSwedishFusional Indo-European BalticLithuanianFusional
Indo-European GermanicGothicFusional Dravidian Dravidian TeluguAgglutinative
Indo-European Indo-AryanHindiFusional Dravidian Dravidian Tamil Agglutinative
Indo-European Indo-AryanUrduFusional UralicFinno-UgricHungarian Agglutinative
Indo-European Indo-AryanMarathiAgglutinative UralicFinno-UgricNorth SamiAgglutinative
Indo-European RomancePortugueseFusional UralicFinnicEstonianAgglutinative
Indo-European RomanceGalicianFusional UralicFinnicFinnishAgglutinative
Indo-European RomanceCatalanFusional Altaic MongolicBuryatAgglutinative
Indo-European RomanceSpanish Fusional Altaic Turkic KazakhAgglutinative
Indo-European RomanceFrench Fusional Altaic Turkic UyghurAgglutinative
Indo-European RomanceRomanianFusional Altaic Turkic TurkishAgglutinative
Indo-European RomanceItalianFusional AfroasiaticSemitic Hebrew Fusional
Indo-European IranianKurmanjiAgglutinative AfroasiaticSemitic ArabicFusional
Indo-European IranianPersianFusional AfroasiaticEgyptianCoptic Agglutinative
Indo-European CelticScottish GaelicFusional Sino-Tibetan SiniticChineseIsolating
Indo-European CelticIrish Fusional Japanese Japanese Japanese Agglutinative
Indo-European Slavic BulgarianFusional AustronesianMalayo-Polynesian Indonesian Agglutinative
Indo-European Slavic RussianFusional ArmenianArmenianArmenianAgglutinative
Indo-European Slavic SlovakFusional BasqueBasqueBasqueAgglutinative
Indo-European Slavic PolishFusional Niger-CongoAtlanticWolofAgglutinative
Indo-European Slavic BelarusianFusional Austroasiatic VieticVietnamese Isolating
Indo-European Slavic UkrainianFusional

Appendix B. Treebank Information

Language BranchTreebanksText TypesWordsSentences
ArmenianArmenian-ArmTDPBlog, fiction, grammar-examples, nonfiction, news, legal52,5852500
ArmenianArmenian-BSUTBlog, fiction, government, web, wiki, nonfiction, news, legal41,8052300
BasqueBasque-BDTNews121,4438993
GermanicAfrikaans-AfriBoomsLegal nonfiction49,2601934
GermanicGerman-HDTNews, nonfiction, web3,455,580189,928
GermanicGerman-GSDReview, wiki, news292,76915,590
GermanicDanish-DDTFiction, nonfiction, news, spoken100,7335512
GermanicDutch-AlpinoNews208,74813,603
GermanicDutch-LassySmallWiki98,2417341
GermanicEnglish-GUMAcademic, blog, email, fiction, government, grammar-examples, legal, medical, news, nonfiction, poetry, reviews, social, spoken, web, wiki187,51510,761
GermanicSwedish-TalbankenNews, nonfiction96,8596026
GermanicSwedish_LinESSpoken, fiction, nonfiction90,9605243
GermanicSwedish-PUDNews, wiki19,0851000
GermanicGothicBible55,3365401
GermanicNorwegian-NynorskBlog, news, nonfiction301,35317,575
SlavicSlovenian-SSJFiction, news, nonfiction267,09716,623
SlavicSpoken SlovenianSpoken29,4883188
SlavicUkrainian-IUBlog, email, fiction, grammar-examples, legal, news, reviews, social, web, wiki122,9837092
SlavicSerbian-SETNews97,6734384
SlavicBelarusian-HSEFiction, legal, news, notification, web, social, wiki305,41725,231
SlavicBulgarian-BTBFiction, legal, news156,14911,138
SlavicCroatian-SETNews, web, wiki199,4099010
SlavicCzech-CACFiction, legal, medical, news, nonfiction, wiki, reviews495,49724,709
SlavicCzech-PDTNews, nonfiction, reviews1,530,00887,907
SlavicPolish-LFGFiction, news, social, spoken, nonfiction130,96717,246
SlavicRussian-TaigaFiction, news, wiki, blog, email, nonfiction, poetry, social197,00117,872
SlavicSlovak-SNKFiction, news, nonfiction106,18410,604
SlavicUpper Sorbian-UFALWiki, nonfiction11,196646
JapaneseJapanese-BCCWJFiction, news, blog, conference, nonfiction1,253,90357,109
DravidianTamil-TTBNews9581600
DravidianTamil-MWTTNews2584534
DravidianTelugu_MTGGrammar-examples64651328
AltaicBuryat-BDTGrammar examples, news, fiction10,185927
AltaicKazakh-KTBNews, fiction, wiki10,5361078
AltaicTurkish-KenetNews, nonfiction183,55516,396
AltaicTurkish-BounNews, nonfiction125,2129761
AltaicUyghur_UDTFiction40,2363456
RomanceCatalan-AnCoraNews553,04216,678
GreekGreek-GUDGrammar examples25,4931807
GreekGreek-GDTWiki, news, spoken63,4412521
RomanceFrench-RhapsodieSpoken44,2423209
RomanceFrench-Paris StoriesSpoken42,7952776
RomanceFrench-GSDBlog, news, review, wiki400,48916,342
RomanceSpanish-PUDNews, wiki23,2871000
RomanceSpanish-AnCoraNews567,89417,662
RomanceSpanish-GSDBlog, news, review, wiki431,58416,013
RomanceGalician-TreeGalNews25,5481000
RomanceItalian-VITNews, nonfiction280,15410,087
RomancePortuguese-BosqueNews227,8279357
RomancePortuguese-PUDNews, wiki23,4071000
RomanceRomanian-RRTAcademic, legal, fiction, medical, nonfiction, news, wiki,218,5229524
Indo-AryanHindi-HDTBNews351,70416,649
Indo-AryanHindi-PUDNews, wiki23,8291000
Indo-AryanMarathi-UFALWiki, fiction3847466
Indo-AryanUrdu-UDTBNews138,0775130
BalticLithuanian-ALKSNISNews, fiction, nonfiction, legal70,0513642
BalticLithuanian-HSENews, nonfiction5356263
CelticIrish-IDTNews, web, fiction, government, legal115,9904910
CelticIrish-twitterSocial47,7902596
CelticScottish GaelicFiction, news, nonfiction, spoken89,9584741
AustronesianIndonesian-PUDNews, wiki19,4461000
AustronesianIndonesian-GSDBlog, news122,0195598
AustronesianIndonesian-CSUINews, nonfiction28,2631030
AustroasiaticVietnamese-VTBNews58,0693323
Niger-Congo AtlanticWolof-WTBBible, wiki44,2582107
AfroasiaticArabic-PUDNews, wiki20,7471000
AfroasiaticArabic-NYUADNews738,88919,738
AfroasiaticHebrew-IAHLT TwikiWiki140,9505039
AfroasiaticHebrew-HTBNews160,1956143
AfroasiaticCoptic-ScriptoriumBible, fiction, nonfiction55,8582163
AfroasiaticMaltese_MUDTNews, nonfiction, legal, fiction, wiki44,1622074
UralicEstonian-EDTFiction, academic, news, nonfiction438,24530,968
UralicFinnish-TDTFiction, legal, news, blog, grammar-examples,202,45315,136
UralicFinnish-TDTPoetry, medical, social, web19,3822122
UralicNorth Sami-GiellaNews, nonfiction26,8453122
UralicHungarian-SzegedNews42,0321800
Sino-Tibetan SiniticChinese-GSDSimpWiki123,2914997
IranianKurmanji_MGFiction, wiki10,260754
IranianPersian-PerDTacademic, blog, fiction, news, nonfiction, web501,77629.107
IranianPersian-Serajifiction, legal, medical, news, nonfiction, social, spoken152,9205997

Appendix C. Morphological Richness (MAMSP) Values in Ascending Order

BranchLanguageMAMSPBranchLanguageMAMSP
VieticVietnamese 1RomanceRomanian1.1791
SiniticChinese1.0015Slavic Slovenian 1.1815
Japanese Japanese 1.0488Slavic Croatian1.1836
GermanicAfrikaans 1.0687Slavic Belarusian1.1928
Malayo-Polynesian Indonesian 1.0829EgyptianCoptic 1.2001
Slavic Russian1.0877Slavic Serbian1.202
GermanicNorwegian-Nynorsk1.0924BalticLithuanian1.2162
Semitic Hebrew 1.0982CelticScottish Gaelic1.2195
Semitic Arabic1.1049Finno-UgricNorth Sami1.228
Finno-UgricHungarian 1.1094GreekGreek1.2391
GermanicSwedish1.1302BasqueBasque1.2416
Indo-AryanHindi1.131Slavic Czech1.2435
Indo-AryanUrdu1.1323CelticIrish 1.2444
GermanicDanish1.1326Dravidian Telugu1.2466
Slavic Bulgarian1.1344Dravidian Tamil 1.2474
GermanicGerman1.135FinnicEstonian1.2503
GermanicEnglish 1.1375ArmenianArmenian1.2518
IranianPersian1.1379AtlanticWolof1.2545
GermanicDutch 1.139RomanceFrench 1.258
RomanceSpanish 1.1393MongolicBuryat1.27
RomanceItalian1.1397FinnicFinnish1.2985
RomanceGalician1.1411IranianKurmanji1.3164
Slavic Slovak1.1445Turkic Kazakh1.3341
RomancePortuguese1.1458Turkic Turkish1.36
Slavic Polish1.1496GermanicGothic1.4006
RomanceCatalan1.1551Indo-AryanMarathi1.4344
Slavic Ukrainian1.1698Turkic Uyghur1.4785
Slavic Upper Sorbian1.1729

Appendix D. Head-Final Dependency Counts and Percentages

rankdeprelhead_finalhead_final_pct_within_depreltotal_for_deprelshare_of_corpus_pct
1case84,58996.2687,87610.26
2amod49,06982.0659,7976.98
3punct48,29639.06123,65114.44
4nsubj45,45078.3358,0256.78
5det44,67395.6546,7055.45
6advmod31,39975.8941,3734.83
7cc28,19292.2530,5603.57
8mark20,80397.0821,4292.50
9obl20,12639.3151,2015.98
10aux14,81175.7919,5422.28
11cop12,46077.5616,0641.88
12obj11,08529.2837,8644.42
13nmod968813.8469,9818.17
14nummod766972.7610,5401.23
15advmod:emph611983.6773130.85
16advcl448839.7111,3011.32
17expl:pv389577.3350370.59
18compound296565.8245050.53
19nsubj:pass294877.6237980.44
20nmod:poss275843.0464080.75
21obl:arg263932.2981740.95
22aux:pass259090.5028620.33
23nummod:gov245598.7924850.29
24expl153878.5119590.23
25xcomp126712.1810,4041.21
26amod:att120599.1812150.14
27ccomp117812.2596151.12
28discourse113165.5317260.20
29expl:pass97481.3011980.14
30dep97224.8739090.46
31mark:prt90899.239150.11
32acl89414.6860910.71
33parataxis81515.3653050.62
34nsubj:cop75876.649890.12
35compound:lvc74498.417560.09
36iobj69834.2220400.24
37nmod:att58398.815900.07
38dislocated44880.435570.07
39advmod:mode36691.504000.05
40det:poss36299.183650.04
41case:gen338100.003380.04
42det:numgov30597.763120.04
43acl:relcl3003.7180970.95
44csubj28915.7218380.21
45vocative23964.253720.04
46obl:tmod23354.574270.05
47advmod:tlocy21092.112280.03
48clf:det20199.012030.02
49orphan19222.598500.10
50advmod:neg18096.261870.02
51compound:prt16120.467870.09
52nmod:tmod15292.121650.02
53aux:neg13892.621490.02
54aux:tense13199.241320.02
55case:acc126100.001260.01
56compound:nn123100.001230.01
57det:nummod11097.351130.01
58obl:mod8624.713480.04
59nmod:gobj7398.65740.01
60advmod:adj6542.481530.02
61nmod:unmarked6224.902490.03
62cc:preconj5898.31590.01
63obl:unmarked5733.331710.02
64nsubj:outer5296.30540.01
65det:predet51100.00510.01
66clf4315.252820.03
67obl:agent418.474840.06
68expl:subj4086.96460.01
69mark:pcomp39100.00390.00
70expl:poss3489.47380.00
71nmod:desc33100.00330.00
72nmod:npmod2873.68380.00
73advmod:locy2890.32310.00
74nmod:obl2870.00400.00
75expl:impers27100.00270.00
76nmod:gsubj26100.00260.00
77reparandum2592.59270.00
78case:voc24100.00240.00
79obl:patient22100.00220.00
80list204.444500.05
81obl:comp1811.041630.02
82xcomp:pred151.788420.10
83compound:preverb1412.841090.01
84nsubj:nn14100.00140.00
85ccomp:obj1339.39330.00
86csubj:vsubj13100.00130.00
87case:adv1376.47170.00
88expl:comp13100.00130.00
89compound:affix1292.31130.00
90aux:caus12100.00120.00
91advmod:tmod1191.67120.00
92advmod:tto10100.00100.00
93csubj:pass106.941440.02
94obl:appl1040.00250.00
95obj:lvc931.03290.00
96obl:pmod76.191130.01
97nmod:lmod7100.0070.00
98csubj:asubj6100.0060.00
99det:pmod63.171890.02
100nsubj:caus5100.0050.00
101advmod:tfrom583.3360.00
102obl:adj529.41170.00
103nsubj:nc5100.0050.00
104csubj:cop53.701350.02
105xcomp:ds58.47590.01
106advmod:to466.6760.00
107obj:appl436.36110.00
108compound:svc43.451160.01
109advcl:cond4100.0040.00
110cop:own411.76340.00
111advcl:cmp428.57140.00
112ccomp:obl39.38320.00
113obl:lvc350.0060.00
114parataxis:discourse3100.0030.00
115obj:caus315.79190.00
116nsubj:xsubj360.0050.00
117advcl:objective34.69640.01
118csubj:outer342.8670.00
119parataxis:insert323.08130.00
120iobj:appl266.6730.00
121obl:prep20.932150.03
122acl:subj21.411420.02
123obl:cmp2100.0020.00
124compound:redup222.2290.00
125advcl:tcl240.0050.00
126iobj:agent266.6730.00
127obl:dat10.881140.01
128advmod:que125.0040.00
129advcl:pred1100.0010.00
130obl:with12.04490.01
131obl:adv1100.0010.00
132compound:z1100.0010.00
133advmod:lmod12.08480.01
134obj:agent114.2970.00

Appendix E. Values of Word Order Flexibility (ENTR) in Ascending Order

BranchLanguagesENTRBranchLanguagesENTR
VieticVietnamese 0.247Slavic Polish0.9109
SiniticChinese0.2985Slavic Ukrainian0.9251
Indo-AryanHindi0.3311Slavic Upper Sorbian0.9501
Japanese Japanese 0.54EgyptianCoptic 0.9546
IranianPersian0.5547MongolicBuryat0.9685
FinnicEstonian0.5821RomanceRomanian0.9823
GermanicNorwegian-Nynorsk0.6191Slavic Croatian0.985
CelticScottish Gaelic0.6439RomanceCatalan0.9851
Indo-AryanUrdu0.6515GermanicGothic0.9871
GermanicEnglish 0.6579Finno-UgricHungarian 1.0001
Turkic Turkish0.6888Finno-UgricNorth Sami1.0021
GermanicSwedish0.6923ArmenianArmenian1.0033
Semitic Arabic0.7068GreekGreek1.0126
RomanceFrench 0.7073RomanceSpanish 1.0194
RomancePortuguese0.7133RomanceItalian1.0198
Semitic Hebrew 0.7134BasqueBasque1.0293
RomanceGalician0.7136Slavic Bulgarian1.1011
Turkic Uyghur0.7179GermanicGerman1.144
GermanicDanish0.7565Malayo-Polynesian Indonesian 1.1602
Indo-AryanMarathi0.7813GermanicDutch 1.2634
Turkic Kazakh0.7869FinnicFinnish1.3202
Dravidian Telugu0.7992Slavic Slovenian 1.3627
Dravidian Tamil 0.8732Slavic Czech1.3761
GermanicAfrikaans 0.8782IranianKurmanji1.3791
Slavic Russian0.8962BalticLithuanian1.3881
CelticIrish 0.9009Slavic Slovak1.4232
Slavic Belarusian0.9019AtlanticWolof1.4246
Slavic Serbian0.9059

Appendix F. Word Order Distribution of the 55 Languages

LanguageSVO %SOV %VSO %VOS %OVS %OSV %LanguageSVO %SOV %VSO %VOS %OVS %OSV %
Armenian0.6895 0.2120 0.0044 0.0049 0.0770 0.0122 Portuguese0.8109 0.0298 0.0167 0.0401 0.1010 0.0015
Basque0.5884 0.3039 0.0034 0.0170 0.0397 0.0476 Romanian0.6111 0.2450 0.0033 0.0075 0.1305 0.0026
Bulgarian0.8201 0.0544 0.0011 0.0200 0.0994 0.0050 Spanish 0.6210 0.1600 0.0110 0.0055 0.1960 0.0065
Belarusian0.8544 0.0391 0.0000 0.0177 0.0621 0.0267 Catalan0.6480 0.1548 0.0122 0.0049 0.1740 0.0061
Croatian0.7344 0.0566 0.0326 0.0567 0.0898 0.0299 Greek0.5846 0.0111 0.3011 0.0876 0.0113 0.0043
Czech0.5080 0.0700 0.1010 0.1100 0.2000 0.0110 Maltese0.2411 0.0000 0.6989 0.0411 0.0189 0.0000
Polish0.7398 0.0440 0.0420 0.0520 0.1210 0.0012 Arabic0.2310 0.0000 0.7284 0.0388 0.0018 0.0000
Russian0.7446 0.0389 0.0110 0.0370 0.1387 0.0298 Hebrew 0.1810 0.0000 0.0650 0.0055 0.0110 0.0000
Slovak0.4720 0.1260 0.0510 0.0780 0.2390 0.0340 Coptic 0.7258 0.2460 0.0040 0.0000 0.0000 0.0242
Slovenian 0.4730 0.1680 0.0390 0.0300 0.2490 0.0410 Marathi0.1123 0.7944 0.0156 0.0211 0.0111 0.0455
Serbian 0.7527 0.0540 0.0133 0.0344 0.1122 0.0343 Hindi0.0389 0.9231 0.0000 0.0016 0.0000 0.0364
Ukrainian0.7345 0.0304 0.0156 0.0385 0.1433 0.0377 Persian0.1488 0.8233 0.0012 0.0000 0.0017 0.0250
Upper Sorbian0.7299 0.0493 0.0187 0.0322 0.1354 0.0345 Urdu0.1520 0.7981 0.0110 0.0000 0.0025 0.0364
Danish0.7968 0.0000 0.0026 0.0000 0.0891 0.1115 Indonesian 0.4818 0.0032 0.3451 0.0343 0.1254 0.0102
Dutch 0.5862 0.1000 0.0030 0.1000 0.0910 0.1198 Irish 0.2975 0.0007 0.6902 0.0106 0.0000 0.0009
English 0.7965 0.0000 0.0027 0.0000 0.0910 0.1098 Scottish Gaelic0.2876 0.0000 0.7044 0.0080 0.0000 0.0000
Afrikaans 0.6714 0.0010 0.2007 0.0030 0.1230 0.0009 Japanese 0.0000 0.8277 0.0076 0.0060 0.0000 0.1587
German0.4810 0.0016 0.3480 0.0300 0.1300 0.0094 Kurmanji0.0910 0.6391 0.0033 0.0015 0.2630 0.0021
Gothic0.6729 0.2087 0.0031 0.0374 0.0374 0.0405 Lithuanian0.5349 0.1115 0.1925 0.1101 0.0446 0.0064
Norwegian-Nynorsk0.6853 0.3133 0.0014 0.0000 0.0000 0.0000 Tamil 0.0000 0.6020 0.0000 0.0000 0.0772 0.3208
Swedish0.7787 0.0000 0.0023 0.0000 0.1002 0.1188 Telugu0.0220 0.7010 0.0000 0.0000 0.0360 0.2410
Estonian0.8497 0.0088 0.0814 0.0091 0.0510 0.0000 Turkish0.0343 0.7798 0.0042 0.0010 0.1510 0.0297
Finnish0.4147 0.0906 0.1116 0.0000 0.1877 0.1954 Kazakh0.0301 0.7661 0.0056 0.0020 0.1477 0.0485
Hungarian 0.5869 0.2243 0.0047 0.0000 0.1280 0.0561 Uyghur0.0344 0.7886 0.0058 0.0021 0.1412 0.0279
North Sami0.6075 0.2607 0.0579 0.0020 0.0599 0.0119 Buryat0.2745 0.6009 0.0023 0.0017 0.1001 0.0205
French 0.7887 0.0820 0.0098 0.0000 0.1160 0.0035 Vietnamese 0.9511 0.0191 0.0000 0.0100 0.0000 0.0198
Galician0.7581 0.2120 0.0075 0.0075 0.0050 0.0100 Chinese0.9311 0.0344 0.0000 0.0000 0.0000 0.0345
Italian0.6117 0.1662 0.0111 0.0059 0.2001 0.0050 Wolof0.8734 0.0240 0.0115 0.0302 0.0255 0.0354

References

  1. Fenk-Oczlon, G.; Pilz, J. Linguistic Complexity: Relationships Between Phoneme Inventory Size, Syllable Complexity, Word and Clause Length, and Population Size. Front. Commun. 2021, 6, 626032. [Google Scholar] [CrossRef]
  2. Sinnemäki, K. Complexity Trade-Offs: A Case Study. In Measuring Grammatical Complexity; Newmeyer, F., Preston, L., Eds.; Oxford University Press: Oxford, UK, 2014; pp. 179–201. [Google Scholar]
  3. Feng, Z. On Computational Complexity of Natural Language [Zìrán Yǔyán de Jìsuàn Fùzá Xìng Yánjiū]. Foreign Lang. Teach. Res. 2015, 659–672. [Google Scholar]
  4. Levshina, N. Token-Based Typology and Word Order Entropy: A Study Based on Universal Dependencies. Linguist. Typology 2019, 23, 533–572. [Google Scholar] [CrossRef]
  5. Berdicevskis, A.; Schmidtke-Bode, K.; Seržant, I. Subjects Tend to Be Coded Only Once: Corpus-Based and Grammar-Based Evidence for an Efficiency-Driven Trade-Off. In Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories; Association for Computational Linguistics: Düsseldorf, Germany, 2020; pp. 79–92. [Google Scholar]
  6. Shao, B.; Yan, J.; Zheng, J. Quantitative Investigation into the Relationship between Word-Class Conversion and the Morphological Typology of Languages. Foreign Lang Teach Res 2023, 55, 497–508. [Google Scholar]
  7. Kong, L.; Qin, H. Multilingual Analysis of Act of Speaking Markers: An Event Encoding Perspective. Foreign Lang. Teach. Res. 2023, 55, 483–496. [Google Scholar]
  8. Yan, J. Morphology and Word Order in Slavic Languages: Insights from Annotated Corpora. Vopr. Jazyk. 2021, 4, 131. [Google Scholar] [CrossRef]
  9. Koplenig, A.; Meyer, P.; Wolfer, S.; Müller-Spitzer, C. The Statistical Trade-Off Between Word Order and Word Structure—Large-Scale Evidence for the Principle of Least Effort. PLoS ONE 2017, 12, e0173614. [Google Scholar] [CrossRef] [PubMed]
  10. Fenk-Oczlon, G.; Fenk, A. Measuring Basic Tempo across Languages and Some Implications for Speech Rhythm. In Proceedings of the INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, 26–30 September 2010; ISCA: Singapore; pp. 1537–1540. [Google Scholar]
  11. Sinnemäki, K. Word Order in Zero-Marking Languages. Stud. Lang. 2010, 34, 869–912. [Google Scholar] [CrossRef]
  12. Liu, H. Dependency Distance as a Metric of Language Comprehension Difficulty. J. Cogn. Sci. 2008, 9, 159–191. [Google Scholar] [CrossRef]
  13. Hawkins, J.A. A Comparative Typology of English and German: Unifying the Contrasts; Routledge: Oxford, UK, 2015. [Google Scholar]
  14. Gibson, E. Linguistic Complexity: Locality of Syntactic Dependencies. Cognition 1998, 68, 1–76. [Google Scholar] [CrossRef] [PubMed]
  15. Hawkins, J.A. Efficiency and Complexity in Grammars; Oxford University Press: Oxford, UK, 2004; ISBN 978-0-19-925268-8. [Google Scholar]
  16. Sinnemäki, K.; Haakana, V. Head and Dependent Marking and Dependency Length in Possessive Noun Phrases: A Typological Study of Morphological and Syntactic Complexity. Linguist. Vanguard 2022, 9, 45–57. [Google Scholar] [CrossRef] [PubMed]
  17. De Marneffe, M.-C.; Manning, C.D.; Nivre, J.; Zeman, D. Universal Dependencies. Comput. Linguist. 2021, 47, 255–308. [Google Scholar] [CrossRef]
  18. Çöltekin, Ç.; Rama, T. What Do Complexity Measures Measure? Correlating and Validating Corpus-Based Measures of Morphological Complexity. Linguist. Vanguard 2023, 9, 27–43. [Google Scholar] [CrossRef]
  19. Xanthos, A.; Gillis, S. Quantifying the Development of Inflectional Diversity. First Lang. 2010, 30, 175–198. [Google Scholar] [CrossRef]
  20. Tesnière, L. Éléments de Syntaxe Structurale; Klincksieck: Paris, France, 1959. [Google Scholar]
  21. Tsunoda, T. Sekai no Gengo to Nihongo [Languages of the World and Japanese]. Kuroshio Publishing: Japan, Tokyo, 2009; Available online: https://www.9640.jp/book_view/?54 (accessed on 5 September 2025).
  22. Kubon, V.; Lopatková, M.; Hercig, T. Searching for a Measure of Word Order Freedom. In Proceedings of the 16th ITAT Conference Information Technologies—Applications and Theory; Kubon, V., Lopatková, M., Hercig, T., Brejova, B., Eds.; CEUR: Tatranské Matliare, Slovakia, 2016; Volume 1649. [Google Scholar]
  23. Li, W.; Liu, H.; Xiong, Z. A Quantitative Analysis of Word Order Freedom and the Abundance of Case Markers in Japanese. Math Linguist 2022, 33, 325–340. [Google Scholar]
  24. Liu, H. Dependency Direction as a Means of Word-Order Typology: A Method Based on Dependency Treebanks. Lingua 2010, 120, 1567–1578. [Google Scholar] [CrossRef]
  25. Niu, R.; Wang, Y.; Liu, H. The Cross-Linguistic Variations in Dependency Distance Minimization and Its Potential Explanations. In Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation (PACLIC 2023), Hong Kong, China, 1–3 December 2023; Association for Computational Linguistics: Hong Kong, China; pp. 559–569. [Google Scholar]
  26. Greenberg, J.H. A Quantitative Approach to the Morphological Typology of Language. In Method and Perspective in Anthropology; Spencer, R.F., Ed.; University of Minnesota Press: Minneapolis, MN, USA, 1954; pp. 192–220. [Google Scholar]
  27. Bickel, B.; Nichols, J. Inflectional Morphology. In Language Typology and Syntactic Description; Shopen, T., Ed.; Cambridge University Press: Cambridge, UK, 2007; pp. 169–240. [Google Scholar]
  28. Dryer, M.S. The Greenbergian Word Order Correlations. Language 1992, 68, 81–138. [Google Scholar] [CrossRef]
  29. Benítez-Burraco, A.; Chen, S.; Gil, D. The Absence of a Trade-Off Between Morphological and Syntactic Complexity. Front. Lang. Sci. 2024, 3, 1340493. [Google Scholar] [CrossRef]
Figure 1. Raw MSP and moving averages across languages.
Figure 1. Raw MSP and moving averages across languages.
Entropy 27 01128 g001
Figure 2. Head-dependent directionality: absolute counts of head-final dependencies across relation types (≈856 k dependencies in total; 38.5% head-final).
Figure 2. Head-dependent directionality: absolute counts of head-final dependencies across relation types (≈856 k dependencies in total; 38.5% head-final).
Entropy 27 01128 g002
Figure 3. Bootstrap distribution of correlation coefficients for r_MAMSP_HF, r_MAMSP_ENTR, and r_HF_ENTR.
Figure 3. Bootstrap distribution of correlation coefficients for r_MAMSP_HF, r_MAMSP_ENTR, and r_HF_ENTR.
Entropy 27 01128 g003
Figure 4. Scatter plots of morphological richness and syntactic directionality.
Figure 4. Scatter plots of morphological richness and syntactic directionality.
Entropy 27 01128 g004
Figure 5. Network of morphological and syntactic subsystems.
Figure 5. Network of morphological and syntactic subsystems.
Entropy 27 01128 g005
Table 1. Word order distribution in Semitic languages (full clauses).
Table 1. Word order distribution in Semitic languages (full clauses).
Language/Data TypeSVO (%)SOV (%)VSO (%)VOS (%)OVS (%)OSV (%)
Arabic
(PUD + NYUAD)
18.10.072.8NA3.90.2
Maltese (MUDT)24.10.069.9NA4.11.9
Hebrew-HTB16.60.03.00.61.10.0
Hebrew-IAHLTwiki19.60.010.00.00.80.0
Table 2. Word order distribution in Hebrew (partial clauses).
Table 2. Word order distribution in Hebrew (partial clauses).
Language/Data TypeSV (%)VS (%)VO (%)
Hebrew-HTB44.525.48.3
Hebrew-IAHLTwiki40.024.84.4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, W.; Liu, H. Interactions Among Morphology, Word Order, and Syntactic Directionality: Evidence from 55 Languages. Entropy 2025, 27, 1128. https://doi.org/10.3390/e27111128

AMA Style

Li W, Liu H. Interactions Among Morphology, Word Order, and Syntactic Directionality: Evidence from 55 Languages. Entropy. 2025; 27(11):1128. https://doi.org/10.3390/e27111128

Chicago/Turabian Style

Li, Wenchao, and Haitao Liu. 2025. "Interactions Among Morphology, Word Order, and Syntactic Directionality: Evidence from 55 Languages" Entropy 27, no. 11: 1128. https://doi.org/10.3390/e27111128

APA Style

Li, W., & Liu, H. (2025). Interactions Among Morphology, Word Order, and Syntactic Directionality: Evidence from 55 Languages. Entropy, 27(11), 1128. https://doi.org/10.3390/e27111128

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop